JAWS Quickstart
Summary
To start running a pipeline in JAWS, please follow the setup instructions provided here.
Run an Example WDL in JAWS
Load the JAWS environment
First, ensure that your environment is set up as described in the setup guide. You can load the JAWS environment using module load as an option. If you prefer to use the JAWS client container or the Python library, please follow the instructions here.
First, ensure that your environment is set up as described in the setup guide. Here is an option, using module load, if you want to use the jaws clinet container or python library, please check the guide.
# module use <SITE_SPECIFC_PATH> # please consult instructions above to find the specif path for each site
module load jaws
Clone the example code
git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git
cd jaws-tutorial-examples/quickstart
List all the sites available to JAWS
jaws list-sites
Currently, JAWS supports the following JAWS compute resources:
DORI (at Dori-JGI)
PERLMUTTER (at NERSC)
JGI (at Lawrencium Cluster - LBNL)
TAHOMA (at PNNL)
DEFIANT (at ORNL)
NMDC (at NERSC)
NMDC_TAHOMA (at PNNL)
Submit a workflow using JAWS
When submitting a JAWS job, specify the compute site (e.g., dori, perlmutter, etc.):
jaws submit align.wdl inputs.json dori
# you should see something like this
100%|███████████████████████████████████| 2929/2929 [00:00<00:00, 1081055.65it/s]
Copied 2929 bytes in 0.0 seconds.
100%|███████████████████████████████████| 792/792 [00:00<00:00, 349231.37it/s]
Copied 792 bytes in 0.0 seconds.
{
"run_id": 35970
}
Monitoring the Job
Once you’ve submitted the workflow, you can monitor it using the run_id. In this example, the run ID is 35970.
If you forget the run_id, you can retrieve it using the following commands:
jaws queue
# or
jaws history
To check the status of the run:
jaws log 35970
# and
jaws status 35970
To check the status of individual tasks within the run:
jaws tasks 35970
Get the results
Once the run status has changed to download complete, the files listed in the output{}
workflow section will be moved to your team’s directory.
You can use the command jaws status <RUN_ID>
to display the output_dir
for a specific run.
You can expect the directory structure to look like this:
/<JAWS TEAM PATH>/<USER_ID>/<RUN_ID>/<Cromwell_ID>
Additionally, if a run fails, JAWS does not automatically transfer the outputs. However, you can manually download the entire Cromwell execution folder for each failed task to your team’s directory. To do this, use the command:
jaws download <RUN_ID>
To learn more about the JAWS Teams directory, please refer to the JAWS Teams documentation.
The Output Understanding the Output Directory Explained
Cromwell creates a directory structure where each task in your workflow runs inside its own execution directory. This is where you’ll find output files, including:
Each task of your workflow gets run inside the execution
directory so it is here that you can
find any output files including the stderr
, stdout
& script
file.
So for our theoretical submission:
jaws submit align.wdl inputs.json dori
We should see an output folder that looks like this:
Description of Cromwell and backend Generated Files
These are the files you might find in the execution
directory:
script.submit: The script that Cromwell passes to HTCondor. This file contains the instructions for submitting the task to HTCondor.
stdout.submit: The standard output from script.submit, showing details about the task’s submission process.
stderr.submit: The standard error from script.submit, useful for debugging any errors during task submission.
submitFile: Contains resource specifications (e.g., memory, CPU requirements) for the task and tells HTCondor how to handle the job.
execution.log: A log file produced by HTCondor that contains details on the running resources and job status.
dockerScript: Defines the Shifter or Singularity command that runs script.
script: Represents the code defined in your workflow’s command{} section.
stdout: Standard output from the task being executed on the compute node.
stderr: Standard error output from the task, useful for identifying issues that occurred during execution.
rc: The return code from the task, indicating success or failure (typically 0 for success and non-zero for failure).