Release Notes

JAWS Release 2/2024 - 04/??/2024

✨ New Features

  • NMDC Tahoma Site: We’ve added a new JAWS site: nmdc_tahoma. This new site is located at the EMSL cluster, and it is dedicated to supporting the NMDC project.

  • Enhanced Error Messaging: Improvements have been made to the error messages for backend task failures, providing clearer guidance for troubleshooting. (Issue 118)

  • jaws history command has new filtering options, allowing searches by JSON file name (e.g., my_inputs.json), WDL file name, user, or tag, using flags such as –json-file, –wdl, –user, and –tag. (Issue 88)

🆕 New behavior for JAWS

  • Dori exvivo nodes will now be dedicated solely to large memory tasks, optimizing resource allocation and enhancing performance. (Issue 122)

  • If a Cromwell task returns a code of 79 (error code related to file system issues), JAWS will automatically retry the task once more. (Issue 139)

  • The JAWS_SITE environment variable is now exported within container environments for running tasks.

This means you can use $JAWS_SITE from your WDL commands{} section for conditional statements (if $JAWS_SITE == "dori" then do something). (Issue 142)

🐛 Bug Fixes

  • Fixed bug where JAWS was not able to create a user subdirectory under the Teams dir. (Issue 1790)

  • Fixed bug for parallel copy files. Parallel copy happens in lieu of globus when submitting jobs to the same site as you submitted from. (Issue 1782)

  • Added retries for Jaws-site retrieving metadata from cromwell. (Issue 1787)

JAWS First Winter Release - 02/07/2024

✨ New Features

  • JAWS will issue a warning if a File type is hardcoded in the WDL file. (Issue 128)

  • We have implemented a validation process to ensure the WDL file aligns with the inputs.json file, using womtool. This validation occurs automatically upon submission and can also be triggered manually by executing the jaws validate command. (Issue 175)

    jaws validate <WDL_FILE> <INPUTS_JSON>
    

🆕 New behavior for JAWS

  • We update the Fair-Share policies: (Issue 136)

    • Task-Level Control:

      • Limit: There’s now a maximum limit of 200 concurrent tasks per user.

      • Queue: Tasks submitted above the 200-task limit will enter a queue.

  • Only successful runs will be copied to the team’s directory. (Issue 154)

  • Our backend system (HTCondor) will attempt to execute a task twice under certain conditions. If your task exceeds its allocated time, or if JAWS-site needs to be reassigned to a new node (Perlmutter site), or due to some instability in the backend, we will automatically initiate a second attempt to run the task.

    Important to note:

    • When the backend or Cromwell retries a task, the cromwell-execution folder will not be reset. In specific cases, this requires adding a verification process in your WDL command stanza. For example, if the command includes creating directories or specific files. Depending on the outcome, you’ll need to determine whether to force the directory or file to be recreated or to skip this step. (Issue 110)

🐛 Bug Fixes

  • Fixed issue related to retrieving metadata from runs that contain over 1 million rows. (Issue 137)

  • A user-friendly error message is displayed when an invalid run_id is entered in the jaws status <RUN_ID> or jaws log <RUN_ID> commands. (Issue 177)

  • Fixed file permission issue for files transferred to the team’s directory via Globus from a compute site that is different from the input site. (Issue 168 and 1739)

  • Fixed issue related to transferring very large output datasets. (Issue 1746)

  • Return an error message to the user when Cromwell submission fails. (Issue 1568)

  • Updated the refdata path for the NMDC site (/global/cfs/cdirs/m3408/refdata). (Issue 1756)

  • workflow_root: null bug is fixed. (Issue 1747)

  • Added a tag to the summary.json supplementary file. (Issue 1758)

  • Increased the length of the JSON basename field. (Issue 1776)

  • Fixed job_id column in the jaws task output. (Issue 1749)

JAWS Second Fall Release - 11/09/2023

✨ New Features

  • JAWS has incorporated support for specifying execution time in the runtime section. In addition to memory and cpu, you can now include the time required to run each task. The mandatory key for specifying this information is runtime_minutes.

    • runtime_minutes:

      • Accepted types: Int

        • Int: minutes. Example:

    runtime {
      docker: "ubuntu@sha256:c9cf959fd83770dfdefd8fb42cfef0761432af36a764c077aed54bbc5bb25368"
      runtime_minutes: 60
      memory: "5G"
      cpu: 4
    }
    

    The benefit of specifying runtime_minutes is that it provides a guarantee that the task will be put on a node with sufficient time.

    ⚠️ If the WDL runtime section uses time as a key or doesn’t specify any time value, the workflow will still be accepted but without the assurance it will be allocated to a suitable node.

  • ✨ JAWS Client Container is Available ✨

    • How to use:

      • Dori:

        JAWS_USER_CONFIG=~/jaws.conf JAWS_CLIENT_CONFIG=/clusterfs/jgi/groups/dsi/homes/svc-jaws/dori-prod/jaws-prod.conf apptainer run docker://doejgi/jaws-client:latest jaws --help
        
        • Append to the end of your ~/.bashrc on DORI:

        jaws() {
          JAWS_USER_CONFIG=~/jaws.conf JAWS_CLIENT_CONFIG=/clusterfs/jgi/groups/dsi/homes/svc-jaws/dori-prod/jaws-prod.conf apptainer run docker://doejgi/jaws-client:latest jaws "$@"
        }
        
      • Perlmutter:

        JAWS_USER_CONFIG=~/jaws.conf JAWS_CLIENT_CONFIG=/global/cfs/projectdirs/jaws/perlmutter-prod/jaws-prod.conf shifter --image=doejgi/jaws-client:latest jaws --help
        
        • Append to the end of your ~/.bashrc on NERSC:

        jaws() {
          JAWS_USER_CONFIG=~/jaws.conf JAWS_CLIENT_CONFIG=/global/cfs/projectdirs/jaws/perlmutter-prod/jaws-prod.conf shifter --image=doejgi/jaws-client:latest jaws "$@"
        }
        

🆕 New behavior for JAWS

  • In our ongoing efforts to enhance performance and maintain a robust, up-to-date environment, we will be discontinuing support for WDL Draft-2 version 12/1/2023.

    • We will display a ⚠️ warning message to notify you if you are still using the old version when submitting a run.

    • We strongly urge all users to update their WDL workflows to the Version 1.0 specification. If you have any questions, please schedule a pair programming session with our Team.

🐛 Bug Fixes

  • We added a new flag to jaws validate to show shellCheck linter validation. In addition, the output also now prints newlines for better readability.

  • Addressed the issue “noise” being printed to stderr for Dori runs. (Issue 130)

  • Increased allowed length of WDL and JSON filenames. (Issue 1744)

  • jaws health command is now operational again.

  • workflow_root: null bug is fixed. (Issue 148)

JAWS First Fall Release - 10/09/2023

✨ New Features

  • Dori Exvivo nodes are now available for JAWS. JAWS is using the long queue (14 days) and can access up to 1.5 TB of memory. (Issue 83)

  • Globus endpoint is available for Perlmutter! 🎉 (Issue 1683)

    • This means that you are now able to submit from Perlmutter to Dori/Tahoma/JGI sites and vice-versa.

  • Teams output directory will include JAWS user ID and will have the following structure (Issue 137):

    /<TEAM PATH>/<JAWS_USER_ID>/<RUN_ID>/<Cromwell_ID>
    
  • New command: jaws tasks <RUN_ID>:

    jaws tasks 18279
    #TASK_DIR       STATUS     QUEUE_START          RUN_START            RUN_END              QUEUE_MIN  RUN_MIN  CACHED  TASK_NAME          REQ_CPU  REQ_GB  REQ_MIN  CPU_HRS
    call-alignment  succeeded  2023-10-02 16:55:07  2023-10-02 16:56:16  2023-10-02 16:56:22  1          0        False   bbtools.alignment  1        1                0.0
    call-samtools   succeeded  2023-10-02 16:56:36  2023-10-02 16:56:55  2023-10-02 16:56:57  0          0        False   bbtools.samtools   1        1                0.0
    
    • jaws tasks integrate two previously distinct commands, jaws task-log and jaws task-summary. Be sure to explore the newly unified and enhanced features:

      • “cpu_hours” metric is now included in the jaws tasks command for each task and jaws status as an aggregation for the entire workflow upon run completion.

      • Cached tasks will now be recorded in the jaws tasks post run completion.

      • jaws tasks status is now updated appropriately when a run is cancelled.

      • jaws tasks now uses timestamps derived from the task log instead of using Cromwell metadata.

  • New command: jaws download <RUN_ID>:

    • If a run fails, JAWS will skip the transfer to team’s directory. However, if you need the cromwell-execution for debugging reasons, you can use our new command to ‘force’ the download of the run:

jaws download 18386
{
    "download_id": 8272,
    "id": 18386,
    "status": "download queued"
}


- Please notice that the run output will be transferred to the team's directory.
  • New command jaws get-user (Issue 107)

    • This command gets current user’s settings.

jaws get-user
{
    "email": "dcassol@lbl.gov",
    "name": "Daniela Cassol",
    "slack_id": "<Member_ID>",
    "teams": [
        "dsi-aa",
        "nmdc"
    ],
  "uid": "dcassol"
}
  • Bash commands employing curl is now working on Tahoma. (Issue 121)

  • Output directory will include a copy of the original WDL, input.json, and subworkflows-ZIP files, for reproducibility. (Issue 1710)

  • jaws status is now verbose by default. If you prefer the short version, please use jaws status --brief. To ensure backward compatibility, when you use the command jaws status --verbose, it will issue a warning and additionally display the output of jaws status.

  • JAWS is now using the latest version of Cromwell-85.

🐛 Bug Fixes

  • JAWS will not transfer workflow outputs to teams output location runs that were canceled by the user. (Issue 156)

  • Fixed teams output directory permissions when the submission and compute site are the same. (Issue 1712)

  • Fixed issue when workflow name and site are the same. (Issue 23)

  • jaws resubmit command is restricted to the members of the team who own the run. (Issue 148)

  • We have improved the error messaging for jaws resubmit, especially when the original submission failed. (Issue 116)

  • jaws status is now displaying local time zone. (Issue 151)

  • jaws status will report when a run was canceled by the user in the result field. (Issue 1714)

  • We are ignoring extra attributes to the runtime{} stanza, for example "runtime_minutes": "20". (Issue 118)

  • When a run is canceled, it will be recorded to jaws tasks <RUN_ID>. (Issue 120)

  • Fixed issues reporting the wrong task status in jaws tasks <RUN_ID> command.

  • Fixed error for transfers when file was named pipe instead of regular file. (Issue 1725)

  • When a run is resubmitted (jaws resubmit <RUN_ID>), JAWS will ensure that all the input files required are still available. (e.g., haven’t been purged). (Issue 110)

  • When a run is resubmitted (jaws resubmit <RUN_ID>), JAWS will update access timestamp (atime) for input files, in order to avoid purging files prematurely. (Issue 1689)

  • Fixed bug when Cromwell submission fails during input processing and was not recognized by JAWS. (Issue 1711)

  • outputs.json supplementary file contain relpaths instead of abspaths. (Issue 1652)

🆕 New behavior for JAWS

  • In our ongoing efforts to enhance performance and maintain a robust, up-to-date environment, we will be discontinuing support for WDL Draft-2 version 11/1/2023.

    • We will display a ⚠️ warning message to notify you if you are still using the old version when submitting a run.

    • We strongly urge all users to update their WDL workflows to the Version 1.0 specification. If you have any questions, please schedule a pair programming session with our Team.

  • JAWS Cromwell configuration was updated, and now Container tags can be used for call caching (Issue 122, Issue 156).

    • We recommended referencing containers by their SHA256 instead of tag (e.g., doejgi/bbtools@sha256:64088.. instead of doejgi/bbtools:latest). While using mutable or “floating” tags in tasks can be convenient in certain scenarios, it adversely impacts the reproducibility of a workflow. For instance, executing the same workflow with doejgi/bbtools:latest now, and then rerunning it in a month or a year, could result in the use of different container images.

  • If the comand stanza uses $TMPDIR, it will have acess to /tmp. Previous we set to execution/ directory (e.g., NFS). (Issue 110)

❌ Deprecated Commands

  • jaws task-log and jaws task-summary have been deprecated (Issue 161, Issue 1719).

    • These two commands have been merged and into a new command, jaws tasks.

    • For backward compatibility, jaws task-log will be reporting the output of jaws tasks.

⚠️ Known issues

  • If the submission and compute sites are different (for example, from Dori to Tahoma), there could still be permission issues in the team’s output directory. In this case, the transfer to the team’s directory happens via Globus, we need are looking into solutions for this problem.

JAWS Summer Release - 09/05/2023

We are releasing a new version of JAWS that includes breaking changes.

JAWS Teams

We’re introducing a new feature in JAWS called “JAWS Teams.” This feature allows for easier management of users and offers a centralized location for sharing and delivering output files for each team.

  • List all the teams available:

jaws teams list
[
    "gt-ga",
    "nmdc",
    "dsi-ii",
    "sc-mcr",
    "gt-seqtech",
    "dsi-aa",
    "gt-syn",
    "phytzm"
]
  • List the teams to which you belong:

jaws teams my-teams
[
    "dsi-aa"
]
  • List the users associated with a team:

jaws teams members dsi-aa
[
    "dcassol",
...
]
  • Get a team’s site config - Outputs path:

jaws teams get-site dsi-aa dori
"/clusterfs/jgi/scratch/dsi/aa/jaws/dori-staging/dsi-aa"
  • Team’s owner has power and responsibilities, such as setting the path for each site, adding and deleting users from the team:

jaws teams set-site <TEAM_ID> <SITE_ID> <PATH>
jaws teams add-user <TEAM_ID> <USER_ID>
jaws teams del-user <TEAM_ID> <USER_ID>

How to use Jaws Teams?

When submitting a new run, you can use --team flag. The outputs for this run will be transferred to the team’s path.

jaws submit align_final.wdl inputs.json dori --team=dsi-aa

As an easy alternative, you can set the default_team at your jaws.conf file:

vi ~/jaws.conf

[USER]
token = <TOKEN>
default_team = dsi-aa

Important: Do not use quotes for the team’s name.

The Jaws Team’s new feature will transfer the final workflow outputs from cromwell-execution to the path defined for your team.

How do I find the output data?

jaws status -–verbose <ID> will provide the path to final path. Please check output_dir.

Then, you can find the expected file tree structure:

/<TEAM PATH>/<RUN_ID>/<Cromwell_ID>

JAWS will copy the following in case of a successful run:

  • Workflow outputs;

  • Supplementary files:
    • errors.json;

    • metadata.json;

    • output_manifest.json;

    • outputs.json;

    • task_summary.json.

JAWS will copy the following in case of a FAILED run:

  • Workflow outputs;

  • Supplementary files;

  • Failed tasks’ cromwell-execution folder only.

As we transition to copying the output files, we are deprecating out the jaws get command. To ensure backward compatibility, this command will remain functional for a few more months, serving solely to copy the final workflow outputs. However, please note that we plan to discontinue this command entirely in upcoming cycles.

We have implemented parallel copying capabilities when both the submission and compute site are the same, for example, from Dori to Dori. This effectively resolves the delay issues associated with the ‘download queue.

Jaws Fair-share policies

We have implemented a JAWS execution throttling, allowing us to have fair-share policies in place. Specifically, there are two layers of control:

Run-Level Control
  • Limit: A maximum of 10 runs can be executed concurrently per site per user.

  • Queue: Any additional run submissions beyond this limit will be automatically placed in the JAWS Queue.

  • Processing: These queued runs will be initiated as soon as one of the currently active runs is completed.

Task-Level Control
  • Limit: A cap of 600 concurrent tasks is set per user.

  • Queue: Any tasks submitted beyond this 600-task limit will be placed in a queue.

Slack Notifications

We replaced Email with Slack notifications when the run is completed.

How to set up Slack Notification?

Please set up your slack_webhook and update your JAWS account. Instructions on how to get your Slack webhook are available here.

jaws update-user –email=dcassol@lbl.gov --slack_id <Member_ID> --slack_webhook <WebHook_URL>

Call-caching Strategy

The call-caching strategy that we used was “xxh64”, and that required a lot of I/O operation to calculate the hash of the entire file content. Now, we replace that with “fingerprint”. fingerprint will take the last modified time, size, and hash from the first 10 mb with xxh64 to create a file fingerprint. Please be aware of that, and please let us know if you think that can cause any file collision when the task will use call-caching. If you want to read more about the call-caching and all the strategies please check here.

Additional minor features

  • Added --forcequeue flag option to jaws submit command. Users can force run submission when the site has been disabled;

  • Resubmitting a run will change the “result” filed to “resubmitted” for jaws status <run id> command;

  • We now parse the runtime{} parameter values correctly when there is a space (i.e., memory: “5 G”) ticket: #111;

  • /refdata is mounted correctly and accessible to the WDLs.

Special note about the DORI TEAMS Folders: For this release, Teams’ folders on Dori must be located at /clusterfs/jgi/scratch/dsi/aa/jaws/teams/. However, once the Globus endpoint changes have been completed, teams’ folders may be edited under the team’s scratch.

JAWS Summer Release Slides

Link to JAWS Calendar

JAWS Sprint Release - 04/12/2023

We released a new version of JAWS.

Here are the changes that are now on PROD:

  • Dori Jaws site is available for testing;

  • Create a config file in your HOME directory:

touch ~/jaws.conf
chmod 600 ~/jaws.conf
  • Copy your token from CORI:

[USER]
token = <copy your token from CORI and paste it here>
  • Module load jaws

module use /clusterfs/jgi/groups/dsi/homes/svc-jaws/modulefiles
module load jaws/dori-prod
jaws submit <WDL_FILE> <INPUTS> dor

NOTE:

  • Dori and Perlmutter have temporary Globus limitations (IT has not configured JAWS application Globus endpoints yet).

  • Temporary workaround:
    • Because data transfer isn’t available, you must log in to the cluster (Dori or Perlmutter) and submit the run from there.

    • You can use Globus endpoint to transfer your data from Cori to Dori, for example.

Deprecated commands

jaws outfiles
jaws outputs
jaws metadata
jaws errors

This is part of the effort to refactor jaws metadata command (uses cromwell metadata). As users are submitting large workflows (> 10k tasks), cromwell metadata became too expensive to query and would sometimes timeout. Instead, we now wait for the run to finish and then write some associated reports (e.g. errors, outfiles, and outputs) to disc. These json files are written to the run’s execution directory and returned to users via jaws get.

  • To find the report files:

jaws get <RUN_ID> <DEST>
ls <DEST>
<workflow>.wdl
<inputs>.json
errors.json
metadata.json
outfiles.json
outputs.json
task_summary.json
  • jaws task-log changed:

    • task-log is now much faster and robust than before and will support runs with greater than 10k tasks.

    • Even more than before, it is a real-time reflection of a run’s current status since it gets its information directly from the backend instead of using an intermediate metadata cromwell, which was a bottleneck.

    • However, because it no longer uses cromwell’s metadata, there are no longer records for cached tasks since they did not actually execute. Therefore, the “cached” column was deleted.

    • Limitation for Tahoma site: jaws task-log isn’t working properly for Tahoma due to the firewall between the compute and workflow node. We have a ticket open for this issue. However, this will not affect your run!

  • jaws task-summary

    It only will be available after the run is finished.

  • jaws status –verbose

    Two more fields: workflow_name and workflow_root

  • jaws validate

    miniwdl is used instead of womtool for WDL validation.

  • jaws resubmit

    New command available! Resubmit a run (at the same compute site).

Fixed bugs reported by users

  • runtime parameters can be on separate lines;

  • Fixed jaws get --flatten;

  • symlinks can be used for the path to the WDL (jaws submit <symlink>/my.wdl my.json dori`)

  • When restarting cromwell service (i.e. upon release) will not interrupt active Runs;

  • Added job purging policies to the HTCondor backend so that it can automatically cancel tasks that stay in HOLD status for too long time.

  • Added --time-min SLURM option for requesting SLURM compute nodes with a shorter wall-clock-time on Perlmutter when maintenance is scheduled.