What is JAWS

JAWS is a multi-site workflow manager that uses the Cromwell workflow engine. Some main directives of JAWS are to make running of bioinformatics workflows easier, foster collaboration between users of the system, and make it possible to move workloads across different DOE resources.

JAWS is composed of four main parts:

  1. a command line interface: Jaws Client;

  2. a centralized orchestration service: Jaws Central, administering runs to multiple sites;

  3. a site service that wraps the workflow engine, like Cromwell, and is installed on a compute site;

  4. a job submission manager, like HTCondor, which submits jobs to worker pools using SLURM.

JAWS Components and Architecture

Below is a diagram of the JAWS architecture. Note that there is some duplication of processes that is meant to demonstrate that “site” can be installed at multiple sites.

The main takeaways here are:

  • All the commands are from the command line and handled by Jaws Client;

  • The Jaws Central is a server that coordinates which compute-site (e.g. LabIT or NERSC) the pipeline is run;

  • GLOBUS transfers all your files from your data source to the computing-site where Cromwell will actually run;

  • Cromwell is the workflow engine that will run the pipeline at the compute-site;

  • HTCondor serves as the backend to Cromwell and handles the running of the jobs on a HPC cluster.

../_images/jaws_architecture-Architecture.svg

Click on the image to enlarge

JAWS Overall Workflow Processing

The user interfaces only with the jaws-client. The jaws-client communicates with jaws-central to move data to the target site and hands over the workflow executions to the respective jaws-site service which in turn runs the workflow to completion and relays the status back to jaws-central. Globus is used as a transfer mechanism between a central data storage location and target sites. The execution of workflows by jaws-site is orchestrated by Cromwell.

jaws-client

jaws-client is a command-line interface for the user and interacts with the central service using defined APIs. jaws-client offers commands to submit and monitor workflows. jaws-central saves metadata about runs, for example, which version of the pipeline was run, runtime statistics, which datasets were processed, etc.

Cromwell

Cromwell is responsible for executing the commands in a workflow. It takes a workflow, written in WDL, and creates instructions on how and when each task should be executed. In our case, the tasks are executed on a user-defined backend, HTCondor.

HTCondor

The main purpose of the HTCondor is to receive tasks from Cromwell and execute them on a compute resource (e.g. HPC cluster). It acts as an abstraction layer between jaws-site and different resources (different clusters and cloud resources).

Globus

GLOBUS transfers all your files from your data source to the computing-site where Cromwell runs and back again.