JAWS Guidelines
In this section, we aim to address all the particularities of JAWS and best practices for using JAWS.
Temporary Directories
If your workflow requires the use of the /tmp
directory, JAWS is configured to handle it in the following way:
If the WDL command stanza uses
$TMPDIR
, t will automatically have access to/tmp
.
This ensures that temporary directories are properly cleaned up after use, maintaining system stability and performance.
Example
version 1.0
workflow HelloWorld {
call Hello
}
task Hello {
command <<<
echo ${TMPDIR}
>>>
runtime {
docker: "ubuntu:latest"
}
}
JAWS Staging Area for Input Files
When you submit a run, JAWS copies the input files to “JAWS staging area.” This ensures that GLOBUS has access to the files, enabling them to be transferred to the compute sites.
A key advantage of this approach is that it allows input files to be cached, reducing the time it takes to submit a new run that reuses the same files.
JAWS client follows a specific pattern for copying files to the staging area:
<SITE_SCRATCH>/inputs/<SITE_ORIGINAL_SITE>/<USER_INPUT_ORIGINAL_PATH>
The JAWS staging area is mounted to the container executing a Cromwell task, providing Cromwell access to all the necessary input files for the task. However, JAWS avoids file duplication since Cromwell is configured to use hard-links, which reference the same data without copying it. Hard-linking is preferred over copying in Cromwell because it saves storage space, speeds up workflow setup, ensures data consistency, preserves file metadata, and reduces I/O overhead.
File Caching
When you submit a new run using the same input files, JAWS will not recopy them. Instead, it references the existing files in the staging area (based on the path above). You will see a message like this:
jaws submit --no-cache align_final.wdl inputs.json dori
Using cached copy of sample.fastq.bz2
Using cached copy of sample.fasta
Note that the –no-cache flag is used by Cromwell and does not relate to input files. This flag determines whether the run outputs should be cached or not.
Handling File Changes
If you modify the content of a file but keep the same filename, the JAWS client will detect the change and provide an error message:
jaws submit --no-cache align_final.wdl inputs.json dori
Error initializing Run: Unable to copy input files:
/clusterfs/jgi/groups/dsi/homes/dcassol/jaws/jaws-tutorial-examples/data/sample.fasta is different from its cached version.
Submitting with this input file can affect previous runs.
Use --overwrite-inputs to force update the cached input files.
If you choose to proceed with the updated input file, use the –overwrite-inputs` flag to force the update of the cached input files. However, be aware that this can affect previous runs that use the same input filename.
Use of Relative Paths for the inputs.json
The default relative path for inputs is based on the location of the inputs.json file, not the directory from which the run is submitted.
This behavior is different from Cromwell, where the relative path defaults to the submission directory. However, this was a decision made by the JAWS community to ensure consistency and to keep the inputs.json file in the same location as the WDL file.
In this example, the full path to the inputs.json file is:
cat $HOME/jaws-tutorial-examples/5min_example/inputs.json
{
"bbtools.reads": "../data/sample.fastq.bz2",
"bbtools.ref": "../data/sample.fasta"
}
Here, the paths provided are relative to the location of the inputs.json file. These paths refer to input files located in the ../data/ directory, relative to the 5min_example folder where the inputs.json is located.
In the data directory, the referenced files are as follows:
ls -la $HOME/jaws-tutorial-examples/data
-rwxrwxr-x 2 dcassol grp-dcassol 2929 Oct 3 2023 sample1.fasta
-rwxrwxr-x 10 dcassol grp-dcassol 792 Mar 20 2023 sample.fastq.bz2
These relative paths ensure that the input files are correctly referenced from the inputs.json file’s location.
If you have any questions, please contact the JAWS team.