What is JAWS
JAWS is a multi-site workflow manager that uses the Cromwell workflow engine. Some main directives of JAWS are to make running of bioinformatics workflows easier, foster collaboration between users of the system, and make it possible to move workloads across different DOE resources.
JAWS is composed of four main parts:
a command line interface: Jaws Client;
a centralized orchestration service: Jaws Central, administering runs to multiple sites;
a site service that wraps the workflow engine, like Cromwell, and is installed on a compute site;
a job submission manager, like HTCondor, which submits jobs to worker pools using SLURM.
JAWS Components and Architecture
Below is a diagram of the JAWS architecture. Note that there is some duplication of processes that is meant to demonstrate that “site” can be installed at multiple sites.
The main takeaways here are:
All the commands are from the command line and handled by
Jaws Client
;The
Jaws Central
is a server that coordinates which compute-site (e.g. LabIT or NERSC) the pipeline is run;GLOBUS transfers all your files from your data source to the computing-site where Cromwell will actually run;
Cromwell is the workflow engine that will run the pipeline at the compute-site;
HTCondor serves as the backend to Cromwell and handles the running of the jobs on a HPC cluster.
Click on the image to enlarge
JAWS Overall Workflow Processing
The user interfaces only with the jaws-client
. The jaws-client
communicates with jaws-central
to move data to the target site and hands over the workflow executions to the respective jaws-site
service which in turn runs the workflow to completion and relays the status back to jaws-central
. Globus is used as a transfer mechanism between a central data storage location and target sites. The execution of workflows by jaws-site
is orchestrated by Cromwell.
jaws-client
jaws-client
is a command-line interface for the user and interacts with the central service using defined APIs. jaws-client
offers commands to submit and monitor workflows. jaws-central
saves metadata about runs, for example, which version of the pipeline was run, runtime statistics, which datasets were processed, etc.
Cromwell
Cromwell is responsible for executing the commands in a workflow. It takes a workflow, written in WDL, and creates instructions on how and when each task should be executed. In our case, the tasks are executed on a user-defined backend, HTCondor.
HTCondor
The main purpose of the HTCondor is to receive tasks from Cromwell and execute them on a compute resource (e.g. HPC cluster). It acts as an abstraction layer between jaws-site
and different resources (different clusters and cloud resources).
Globus
GLOBUS transfers all your files from your data source to the computing-site where Cromwell runs and back again.