Release Notes
JAWS Release 2/2024 - 04/??/2024
✨ New Features
NMDC Tahoma Site: We’ve added a new JAWS site: nmdc_tahoma. This new site is located at the EMSL cluster, and it is dedicated to supporting the NMDC project.
Enhanced Error Messaging: Improvements have been made to the error messages for backend task failures, providing clearer guidance for troubleshooting. (Issue 118)
jaws history
command has new filtering options, allowing searches by JSON file name (e.g., my_inputs.json), WDL file name, user, or tag, using flags such as –json-file, –wdl, –user, and –tag. (Issue 88)
🆕 New behavior for JAWS
Dori exvivo nodes will now be dedicated solely to large memory tasks, optimizing resource allocation and enhancing performance. (Issue 122)
If a Cromwell task returns a code of 79 (error code related to file system issues), JAWS will automatically retry the task once more. (Issue 139)
The
JAWS_SITE
environment variable is now exported within container environments for running tasks.This means you can use
$JAWS_SITE
from your WDLcommands{}
section for conditional statements (if$JAWS_SITE == "dori"
then do something). (Issue 142)
🐛 Bug Fixes
Fixed bug where JAWS was not able to create a user subdirectory under the Teams dir. (Issue 1790)
Fixed bug for parallel copy files. Parallel copy happens in lieu of globus when submitting jobs to the same site as you submitted from. (Issue 1782)
Added retries for Jaws-site retrieving metadata from cromwell. (Issue 1787)
JAWS First Winter Release - 02/07/2024
✨ New Features
JAWS will issue a warning if a File type is hardcoded in the WDL file. (Issue 128)
We have implemented a validation process to ensure the WDL file aligns with the inputs.json file, using womtool. This validation occurs automatically upon submission and can also be triggered manually by executing the
jaws validate
command. (Issue 175)jaws validate <WDL_FILE> <INPUTS_JSON>
🆕 New behavior for JAWS
We update the Fair-Share policies: (Issue 136)
Task-Level Control:
Limit: There’s now a maximum limit of 200 concurrent tasks per user.
Queue: Tasks submitted above the 200-task limit will enter a queue.
Only successful runs will be copied to the team’s directory. (Issue 154)
Our backend system (HTCondor) will attempt to execute a task twice under certain conditions. If your task exceeds its allocated time, or if JAWS-site needs to be reassigned to a new node (Perlmutter site), or due to some instability in the backend, we will automatically initiate a second attempt to run the task.
Important to note:
When the backend or Cromwell retries a task, the cromwell-execution folder will not be reset. In specific cases, this requires adding a verification process in your WDL command stanza. For example, if the command includes creating directories or specific files. Depending on the outcome, you’ll need to determine whether to force the directory or file to be recreated or to skip this step. (Issue 110)
🐛 Bug Fixes
Fixed issue related to retrieving metadata from runs that contain over 1 million rows. (Issue 137)
A user-friendly error message is displayed when an invalid run_id is entered in the
jaws status <RUN_ID>
orjaws log <RUN_ID>
commands. (Issue 177)Fixed file permission issue for files transferred to the team’s directory via Globus from a compute site that is different from the input site. (Issue 168 and 1739)
Fixed issue related to transferring very large output datasets. (Issue 1746)
Return an error message to the user when Cromwell submission fails. (Issue 1568)
Updated the refdata path for the NMDC site (
/global/cfs/cdirs/m3408/refdata
). (Issue 1756)
workflow_root: null
bug is fixed. (Issue 1747)Added a tag to the summary.json supplementary file. (Issue 1758)
Increased the length of the JSON basename field. (Issue 1776)
Fixed job_id column in the
jaws task
output. (Issue 1749)
JAWS Second Fall Release - 11/09/2023
✨ New Features
JAWS has incorporated support for specifying execution time in the runtime section. In addition to memory and cpu, you can now include the time required to run each task. The mandatory key for specifying this information is runtime_minutes.
runtime_minutes:
Accepted types: Int
Int: minutes. Example:
runtime { docker: "ubuntu@sha256:c9cf959fd83770dfdefd8fb42cfef0761432af36a764c077aed54bbc5bb25368" runtime_minutes: 60 memory: "5G" cpu: 4 }The benefit of specifying runtime_minutes is that it provides a guarantee that the task will be put on a node with sufficient time.
⚠️ If the WDL runtime section uses time as a key or doesn’t specify any time value, the workflow will still be accepted but without the assurance it will be allocated to a suitable node.
✨ JAWS Client Container is Available ✨
How to use:
Dori:
JAWS_USER_CONFIG=~/jaws.conf JAWS_CLIENT_CONFIG=/clusterfs/jgi/groups/dsi/homes/svc-jaws/dori-prod/jaws-prod.conf apptainer run docker://doejgi/jaws-client:latest jaws --help
Append to the end of your
~/.bashrc
on DORI:jaws() { JAWS_USER_CONFIG=~/jaws.conf JAWS_CLIENT_CONFIG=/clusterfs/jgi/groups/dsi/homes/svc-jaws/dori-prod/jaws-prod.conf apptainer run docker://doejgi/jaws-client:latest jaws "$@" }Perlmutter:
JAWS_USER_CONFIG=~/jaws.conf JAWS_CLIENT_CONFIG=/global/cfs/projectdirs/jaws/perlmutter-prod/jaws-prod.conf shifter --image=doejgi/jaws-client:latest jaws --help
Append to the end of your
~/.bashrc
on NERSC:jaws() { JAWS_USER_CONFIG=~/jaws.conf JAWS_CLIENT_CONFIG=/global/cfs/projectdirs/jaws/perlmutter-prod/jaws-prod.conf shifter --image=doejgi/jaws-client:latest jaws "$@" }
🆕 New behavior for JAWS
In our ongoing efforts to enhance performance and maintain a robust, up-to-date environment, we will be discontinuing support for WDL Draft-2 version 12/1/2023.
We will display a ⚠️ warning message to notify you if you are still using the old version when submitting a run.
We strongly urge all users to update their WDL workflows to the Version 1.0 specification. If you have any questions, please schedule a pair programming session with our Team.
🐛 Bug Fixes
We added a new flag to
jaws validate
to showshellCheck
linter validation. In addition, the output also now prints newlines for better readability.Addressed the issue “noise” being printed to
stderr
for Dori runs. (Issue 130)Increased allowed length of WDL and JSON filenames. (Issue 1744)
jaws health
command is now operational again.
workflow_root: null
bug is fixed. (Issue 148)
JAWS First Fall Release - 10/09/2023
✨ New Features
Dori Exvivo nodes are now available for JAWS. JAWS is using the long queue (14 days) and can access up to 1.5 TB of memory. (Issue 83)
Globus endpoint is available for Perlmutter! 🎉 (Issue 1683)
This means that you are now able to submit from Perlmutter to Dori/Tahoma/JGI sites and vice-versa.
Teams output directory will include JAWS user ID and will have the following structure (Issue 137):
/<TEAM PATH>/<JAWS_USER_ID>/<RUN_ID>/<Cromwell_ID>New command:
jaws tasks <RUN_ID>
:jaws tasks 18279 #TASK_DIR STATUS QUEUE_START RUN_START RUN_END QUEUE_MIN RUN_MIN CACHED TASK_NAME REQ_CPU REQ_GB REQ_MIN CPU_HRS call-alignment succeeded 2023-10-02 16:55:07 2023-10-02 16:56:16 2023-10-02 16:56:22 1 0 False bbtools.alignment 1 1 0.0 call-samtools succeeded 2023-10-02 16:56:36 2023-10-02 16:56:55 2023-10-02 16:56:57 0 0 False bbtools.samtools 1 1 0.0
jaws tasks
integrate two previously distinct commands,jaws task-log
andjaws task-summary
. Be sure to explore the newly unified and enhanced features:
“cpu_hours” metric is now included in the
jaws tasks
command for each task andjaws status
as an aggregation for the entire workflow upon run completion.Cached tasks will now be recorded in the
jaws tasks
post run completion.
jaws tasks
status is now updated appropriately when a run is cancelled.
jaws tasks
now uses timestamps derived from the task log instead of using Cromwell metadata.
New command:
jaws download <RUN_ID>
:
If a run fails, JAWS will skip the transfer to team’s directory. However, if you need the cromwell-execution for debugging reasons, you can use our new command to ‘force’ the download of the run:
jaws download 18386 { "download_id": 8272, "id": 18386, "status": "download queued" } - Please notice that the run output will be transferred to the team's directory.
New command
jaws get-user
(Issue 107)
This command gets current user’s settings.
jaws get-user { "email": "dcassol@lbl.gov", "name": "Daniela Cassol", "slack_id": "<Member_ID>", "teams": [ "dsi-aa", "nmdc" ], "uid": "dcassol" }
Bash commands employing
curl
is now working on Tahoma. (Issue 121)Output directory will include a copy of the original WDL, input.json, and subworkflows-ZIP files, for reproducibility. (Issue 1710)
jaws status
is now verbose by default. If you prefer the short version, please usejaws status --brief
. To ensure backward compatibility, when you use the commandjaws status --verbose
, it will issue a warning and additionally display the output ofjaws status
.JAWS is now using the latest version of Cromwell-85.
🐛 Bug Fixes
JAWS will not transfer workflow outputs to teams output location runs that were canceled by the user. (Issue 156)
Fixed teams output directory permissions when the submission and compute site are the same. (Issue 1712)
Fixed issue when workflow name and site are the same. (Issue 23)
jaws resubmit
command is restricted to the members of the team who own the run. (Issue 148)We have improved the error messaging for
jaws resubmit
, especially when the original submission failed. (Issue 116)
jaws status
is now displaying local time zone. (Issue 151)
jaws status
will report when a run was canceled by the user in the result field. (Issue 1714)We are ignoring extra attributes to the
runtime{}
stanza, for example"runtime_minutes": "20"
. (Issue 118)When a run is canceled, it will be recorded to
jaws tasks <RUN_ID>
. (Issue 120)Fixed issues reporting the wrong task status in
jaws tasks <RUN_ID>
command.Fixed error for transfers when file was named pipe instead of regular file. (Issue 1725)
When a run is resubmitted (
jaws resubmit <RUN_ID>
), JAWS will ensure that all the input files required are still available. (e.g., haven’t been purged). (Issue 110)When a run is resubmitted (
jaws resubmit <RUN_ID>
), JAWS will update access timestamp (atime) for input files, in order to avoid purging files prematurely. (Issue 1689)Fixed bug when Cromwell submission fails during input processing and was not recognized by JAWS. (Issue 1711)
outputs.json
supplementary file contain relpaths instead of abspaths. (Issue 1652)
🆕 New behavior for JAWS
In our ongoing efforts to enhance performance and maintain a robust, up-to-date environment, we will be discontinuing support for WDL Draft-2 version 11/1/2023.
We will display a ⚠️ warning message to notify you if you are still using the old version when submitting a run.
We strongly urge all users to update their WDL workflows to the Version 1.0 specification. If you have any questions, please schedule a pair programming session with our Team.
JAWS Cromwell configuration was updated, and now Container tags can be used for call caching (Issue 122, Issue 156).
We recommended referencing containers by their SHA256 instead of tag (e.g.,
doejgi/bbtools@sha256:64088..
instead ofdoejgi/bbtools:latest
). While using mutable or “floating” tags in tasks can be convenient in certain scenarios, it adversely impacts the reproducibility of a workflow. For instance, executing the same workflow withdoejgi/bbtools:latest
now, and then rerunning it in a month or a year, could result in the use of different container images.If the comand stanza uses
$TMPDIR
, it will have acess to/tmp
. Previous we set toexecution/
directory (e.g., NFS). (Issue 110)
❌ Deprecated Commands
jaws task-log
andjaws task-summary
have been deprecated (Issue 161, Issue 1719).
These two commands have been merged and into a new command,
jaws tasks
.For backward compatibility,
jaws task-log
will be reporting the output ofjaws tasks
.
⚠️ Known issues
If the submission and compute sites are different (for example, from Dori to Tahoma), there could still be permission issues in the team’s output directory. In this case, the transfer to the team’s directory happens via Globus, we need are looking into solutions for this problem.
JAWS Summer Release - 09/05/2023
We are releasing a new version of JAWS that includes breaking changes.
JAWS Teams
We’re introducing a new feature in JAWS called “JAWS Teams.” This feature allows for easier management of users and offers a centralized location for sharing and delivering output files for each team.
List all the teams available:
jaws teams list
[
"gt-ga",
"nmdc",
"dsi-ii",
"sc-mcr",
"gt-seqtech",
"dsi-aa",
"gt-syn",
"phytzm"
]
List the teams to which you belong:
jaws teams my-teams
[
"dsi-aa"
]
List the users associated with a team:
jaws teams members dsi-aa
[
"dcassol",
...
]
Get a team’s site config - Outputs path:
jaws teams get-site dsi-aa dori
"/clusterfs/jgi/scratch/dsi/aa/jaws/dori-staging/dsi-aa"
Team’s owner has power and responsibilities, such as setting the path for each site, adding and deleting users from the team:
jaws teams set-site <TEAM_ID> <SITE_ID> <PATH>
jaws teams add-user <TEAM_ID> <USER_ID>
jaws teams del-user <TEAM_ID> <USER_ID>
How to use Jaws Teams?
When submitting a new run, you can use --team
flag. The outputs for this run will be transferred to the team’s path.
jaws submit align_final.wdl inputs.json dori --team=dsi-aa
As an easy alternative, you can set the default_team
at your jaws.conf
file:
vi ~/jaws.conf
[USER]
token = <TOKEN>
default_team = dsi-aa
Important: Do not use quotes for the team’s name.
The Jaws Team’s new feature will transfer the final workflow outputs from cromwell-execution to the path defined for your team.
How do I find the output data?
jaws status -–verbose <ID>
will provide the path to final path. Please check output_dir
.
Then, you can find the expected file tree structure:
/<TEAM PATH>/<RUN_ID>/<Cromwell_ID>
JAWS will copy the following in case of a successful run:
Workflow outputs;
- Supplementary files:
errors.json;
metadata.json;
output_manifest.json;
outputs.json;
task_summary.json.
JAWS will copy the following in case of a FAILED run:
Workflow outputs;
Supplementary files;
Failed tasks’ cromwell-execution folder only.
As we transition to copying the output files, we are deprecating out the jaws get
command.
To ensure backward compatibility, this command will remain functional for a few more months, serving solely to copy the final workflow outputs.
However, please note that we plan to discontinue this command entirely in upcoming cycles.
We have implemented parallel copying capabilities when both the submission and compute site are the same, for example, from Dori to Dori. This effectively resolves the delay issues associated with the ‘download queue.’
Slack Notifications
We replaced Email with Slack notifications when the run is completed.
How to set up Slack Notification?
Please set up your slack_webhook
and update your JAWS account.
Instructions on how to get your Slack webhook are available here.
jaws update-user –email=dcassol@lbl.gov --slack_id <Member_ID> --slack_webhook <WebHook_URL>
Call-caching Strategy
The call-caching strategy that we used was “xxh64”, and that required a lot of I/O operation to calculate the hash of the entire file content. Now, we replace that with “fingerprint”. fingerprint will take the last modified time, size, and hash from the first 10 mb with xxh64 to create a file fingerprint. Please be aware of that, and please let us know if you think that can cause any file collision when the task will use call-caching. If you want to read more about the call-caching and all the strategies please check here.
Additional minor features
Added
--forcequeue
flag option tojaws submit
command. Users can force run submission when the site has been disabled;Resubmitting a run will change the “result” filed to “resubmitted” for
jaws status <run id>
command;We now parse the
runtime{}
parameter values correctly when there is a space (i.e., memory: “5 G”) ticket: #111;/refdata
is mounted correctly and accessible to the WDLs.
Special note about the DORI TEAMS Folders:
For this release, Teams’ folders on Dori must be located at /clusterfs/jgi/scratch/dsi/aa/jaws/teams/
.
However, once the Globus endpoint changes have been completed, teams’ folders may be edited under the team’s scratch.
JAWS Sprint Release - 04/12/2023
We released a new version of JAWS.
Here are the changes that are now on PROD:
Dori Jaws site is available for testing;
Create a config file in your HOME directory:
touch ~/jaws.conf chmod 600 ~/jaws.conf
Copy your token from CORI:
[USER] token = <copy your token from CORI and paste it here>
Module load jaws
module use /clusterfs/jgi/groups/dsi/homes/svc-jaws/modulefiles module load jaws/dori-prod jaws submit <WDL_FILE> <INPUTS> dor
NOTE:
Dori and Perlmutter have temporary Globus limitations (IT has not configured JAWS application Globus endpoints yet).
- Temporary workaround:
Because data transfer isn’t available, you must log in to the cluster (Dori or Perlmutter) and submit the run from there.
You can use Globus endpoint to transfer your data from Cori to Dori, for example.
Deprecated commands
jaws outfiles jaws outputs jaws metadata jaws errors
This is part of the effort to refactor jaws metadata
command (uses cromwell metadata).
As users are submitting large workflows (> 10k tasks), cromwell metadata became too expensive to query and would sometimes timeout. Instead, we now wait for the run to finish and then write some associated reports (e.g. errors, outfiles, and outputs) to disc.
These json files are written to the run’s execution directory and returned to users via jaws get
.
To find the report files:
jaws get <RUN_ID> <DEST> ls <DEST> <workflow>.wdl <inputs>.json errors.json metadata.json outfiles.json outputs.json task_summary.json
jaws task-log
changed:task-log
is now much faster and robust than before and will support runs with greater than 10k tasks.Even more than before, it is a real-time reflection of a run’s current status since it gets its information directly from the backend instead of using an intermediate metadata cromwell, which was a bottleneck.
However, because it no longer uses cromwell’s metadata, there are no longer records for cached tasks since they did not actually execute. Therefore, the “cached” column was deleted.
Limitation for Tahoma site:
jaws task-log
isn’t working properly for Tahoma due to the firewall between the compute and workflow node. We have a ticket open for this issue. However, this will not affect your run!
jaws task-summary
It only will be available after the run is finished.
jaws status –verbose
Two more fields: workflow_name and workflow_root
jaws validate
miniwdl is used instead of womtool for WDL validation.
jaws resubmit
New command available! Resubmit a run (at the same compute site).
Fixed bugs reported by users
runtime parameters can be on separate lines;
Fixed
jaws get --flatten
;symlinks can be used for the path to the WDL (
jaws submit <symlink>/my.wdl my.json dori`
)When restarting cromwell service (i.e. upon release) will not interrupt active Runs;
Added job purging policies to the HTCondor backend so that it can automatically cancel tasks that stay in HOLD status for too long time.
Added
--time-min
SLURM option for requesting SLURM compute nodes with a shorter wall-clock-time on Perlmutter when maintenance is scheduled.