JAWS Quickstart

Summary

To start running a pipeline in JAWS, please follow the setup instructions provided here.

Run an Example WDL in JAWS

  1. Load the JAWS environment

First, ensure that your environment is set up as described in the setup guide. You can load the JAWS environment using module load as an option. If you prefer to use the JAWS client container or the Python library, please follow the instructions here.

First, ensure that your environment is set up as described in the setup guide. Here is an option, using module load, if you want to use the jaws clinet container or python library, please check the guide.

# module use <SITE_SPECIFC_PATH> # please consult instructions above to find the specif path for each site
module load jaws
  1. Clone the example code

git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git
cd jaws-tutorial-examples/quickstart
  1. List all the sites available to JAWS

jaws list-sites

Currently, JAWS supports the following JAWS compute resources:

  • DORI (at Dori-JGI)

  • PERLMUTTER (at NERSC)

  • JGI (at Lawrencium Cluster - LBNL)

  • TAHOMA (at PNNL)

  • DEFIANT (at ORNL)

  • NMDC (at NERSC)

  • NMDC_TAHOMA (at PNNL)

  1. Submit a workflow using JAWS

When submitting a JAWS job, specify the compute site (e.g., dori, perlmutter, etc.):

jaws submit align.wdl inputs.json dori

# you should see something like this
100%|███████████████████████████████████| 2929/2929 [00:00<00:00, 1081055.65it/s]
Copied 2929 bytes in 0.0 seconds.
100%|███████████████████████████████████| 792/792 [00:00<00:00, 349231.37it/s]
Copied 792 bytes in 0.0 seconds.
{
"run_id": 35970
}

Monitoring the Job

Once you’ve submitted the workflow, you can monitor it using the run_id. In this example, the run ID is 35970.

  • If you forget the run_id, you can retrieve it using the following commands:

jaws queue
# or
jaws history
  • To check the status of the run:

jaws log 35970
# and
jaws status 35970
  • To check the status of individual tasks within the run:

jaws tasks 35970

Get the results

Once the run status has changed to download complete, the files listed in the output{} workflow section will be moved to your team’s directory.

You can use the command jaws status <RUN_ID> to display the output_dir for a specific run.

You can expect the directory structure to look like this:

/<JAWS TEAM PATH>/<USER_ID>/<RUN_ID>/<Cromwell_ID>

Additionally, if a run fails, JAWS does not automatically transfer the outputs. However, you can manually download the entire Cromwell execution folder for each failed task to your team’s directory. To do this, use the command:

jaws download <RUN_ID>

To learn more about the JAWS Teams directory, please refer to the JAWS Teams documentation.

The Output Understanding the Output Directory Explained

Cromwell creates a directory structure where each task in your workflow runs inside its own execution directory. This is where you’ll find output files, including:

../_images/jaws_cromwell.svg

Each task of your workflow gets run inside the execution directory so it is here that you can find any output files including the stderr, stdout & script file.

So for our theoretical submission:

jaws submit align.wdl inputs.json dori

We should see an output folder that looks like this:

../_images/jaws_cromwell_1.svg

Description of Cromwell and backend Generated Files

These are the files you might find in the execution directory:

  • script.submit: The script that Cromwell passes to HTCondor. This file contains the instructions for submitting the task to HTCondor.

  • stdout.submit: The standard output from script.submit, showing details about the task’s submission process.

  • stderr.submit: The standard error from script.submit, useful for debugging any errors during task submission.

  • submitFile: Contains resource specifications (e.g., memory, CPU requirements) for the task and tells HTCondor how to handle the job.

  • execution.log: A log file produced by HTCondor that contains details on the running resources and job status.

  • dockerScript: Defines the Shifter or Singularity command that runs script.

  • script: Represents the code defined in your workflow’s command{} section.

  • stdout: Standard output from the task being executed on the compute node.

  • stderr: Standard error output from the task, useful for identifying issues that occurred during execution.

  • rc: The return code from the task, indicating success or failure (typically 0 for success and non-zero for failure).