JAWS Backend

Summary

In this section, we explain how tasks are processed and managed in JAWS through the integration of Cromwell, HTCondor, JAWS Pool Manager, and SLURM. We will provide an overview of the workflow submission process and the role each component plays.

Understanding JAWS Task Submission Workflow

../_images/HTCondor_Slurm.svg

JAWS uses a distributed computing approach, leveraging multiple systems to ensure efficient task scheduling, management, and execution. Here’s a breakdown of the components and their roles in the workflow submission process:

1. Cromwell: The Workflow Engine

Cromwell acts as the workflow engine in JAWS. It is responsible for submitting tasks to the job scheduler (HTCondor) for execution. Cromwell manages workflows written in the Workflow Description Language (WDL).

2. HTCondor: Job Scheduler

HTCondor is responsible for managing and queuing the tasks submitted by Cromwell. It plays a crucial role in distributing the tasks to the appropriate resources based on availability and capacity.

3. JAWS Pool Manager

The JAWS Pool Manager monitors the HTCondor queue to determine how many SLURM nodes are required to process the queued tasks. Once determined, it requests SLURM nodes using the –exclusive flag via the sbatch command to reserve dedicated resources for task execution.

4. Compute Pool: Execution Resources

Once SLURM allocates the necessary resources, HTCondor submits tasks to the available compute nodes in the pool. After a task is completed, the next task in the HTCondor queue is submitted to the SLURM node. If no more tasks are available, the JAWS Pool Manager releases the SLURM nodes using the scancel command.

Files Generated During Execution

During the task submission and execution process, various files are generated in the execution directory. These files can be helpful for monitoring task progress and troubleshooting errors. Below is a list of common files and their purposes:

  • script.submit: The script that Cromwell passes to HTCondor. This file contains the instructions for submitting the task to HTCondor.

  • stdout.submit: The standard output from script.submit, showing details about the task’s submission process.

  • stderr.submit: The standard error from script.submit, useful for debugging any errors during task submission.

  • submitFile: Contains resource specifications (e.g., memory, CPU requirements) for the task and tells HTCondor how to handle the job.

  • execution.log: A log file produced by HTCondor that contains details on the running resources and job status.

  • dockerScript: Defines the Shifter or Singularity command that runs script.

  • script: Represents the code defined in your workflow’s command{} section.

  • stdout: Standard output from the task being executed on the compute node.

  • stderr: Standard error output from the task, useful for identifying issues that occurred during execution.

  • rc: The return code from the task, indicating success or failure (typically 0 for success and non-zero for failure).