FAQs
Code Snippets to Answer Common WDL Design Problems
OpenWDL provides the WDL functions in specs for version 1.0
Building WDLs
How can I use bash commands that require curly braces? 🔗
If you ever need to use curly braces in bash to strip a suffix ‘txt’ or set a default, consider using the WDL Version 1.0 Specification.
In WDL document, the first line should indicate version 1.0. You’ll specify the command section like command <<< >>>
instead of using curly braces. Additionally, you’ll need to adjust some other formats. Refer to the Version 1.0 specification link for more details.
command <<<
# setting a default value in bash
VAR=${VAR:=25}
# strip a suffix
myvar=${somefile%.txt}
>>>
How can I output a file that has been named dynamically as a bash variable 🔗
Bash variables created in the command { }
block cannot be seen outside the block, for example, in the output { }
section. Therefore, you can write the name(s) of any output files to another file which will be read inside the output { }
block.
This is the official WDL way, using glob:
output {
Array[File] output_bams = glob("*.bam")
}
This is another method:
command{
echo $lib.bam > list_of_files
}
output {
Array[File] = read_lines("list_of_files")
}
To see more about read_lines() and other WDL functions, see openwdl/wdl.
Using conditionals 🔗
workflow conditional_example {
File infile
call wc as wc_before { input: infile = infile }
Int num_lines = wc_before.num_lines
if (num_lines > 10) {
call truncate { input: infile = infile }
}
# This function will return false if the defined() argument is an
# unset optional value. It will return true in all other cases.
Boolean has_head_file = defined(truncate.outfile)
if (has_head_file) {
call wc as wc_after { input: infile = truncate.outfile }
}
# notice the '?' after File. These are required since these files may not exist.
output {
File wc_before_file = wc_before.outfile
File? head_file = truncate.outfile
File? wc_after_file = wc_after.outfile
}
}
task wc {
File infile
command { wc -l < ${infile} | tee wc.txt }
output {
Int num_lines = read_int(stdout())
File outfile = "wc.txt"
}
}
How to scatter over arrays and maps 🔗
Although you can scatter over arrays and maps, there is different syntax for each. You can only scatter over an array with this syntax
Array[String] some_array
scatter (e in some_array) {
String value = some_array[e]
call some_task {input: value = value}
}
But you can iterate over a map by using the ‘pair’ keyword and then ‘.left’ and ‘.right’ as such
Map[String,String] some_map
scatter (pair in some_map) {
String key= pair.left
String value = pair.right # or String val = some_map[key]
call some_task {input: value = value}
}
You can see working examples for scattering an array and scattering a map.
Custom data structures 🔗
Besides Map, Array, Pair you can create a custom data structure using “struct”. This will be similar to a hash but can contain any combination of data types.
Documentation for Custom Type “Struct”.
Example main.wdl && inputs.json.
Get Keys from a Map 🔗
As of version 1.0 of the wdl spec, there is no direct way to get an array of Map keys. This will become available in version 1.1. As a work-around for now, you can use the Pair
data type instead of Map
as follows.
version 1.0
workflow test {
input {
Array[Pair[Float,String]] my_map = [(0.1, "mouse"), (3, "cat"), (15, "dog")]
}
scatter (pairs in my_map) {
String keys = pairs.left
}
output {Array[String] allouts = keys}
}
Note that the default format for my_map
in the WDL is different that in an input.json
{
"test.my_map": [{"Left": 0.1, "Right": "mouse"}, {"Left": 3, "Right": "cat"}, {"Left": 15, "Right": "dog"}]
}
An example of a scatter/gather model when a scattered task is optional 🔗
You can use this example to see where you need to declare optional varaibles (i.e. Array[Array[String?]]) and
how you can use two wdl functions, flatten
and select_all
to convert an optional
variable (Array[Array[String?]) to an Array[String].
See this line:
Array[String] flat_array = flatten(select_all(num_array))
# note that this line is not within any stanza, but between input{} and command<<<>>>
and note that
flatten
is used to convert an array-of-an-array to an array.select_all
, used in this example, converts Array[String?] to Array[String].version 1.0
workflow flatten_it {
input {
Boolean try_it = false
Array[Int] numbers = [1,2,3]
}
scatter (num in numbers) {
if (try_it == true) {
call do_if_true {
input: num = num
}
}
# call something_else {}
}
call gather {
input: num_array = do_if_true.out
}
output {
Array[File] out_array = gather.final_array
}
}
task do_if_true {
input {
Int num
}
command <<<
echo "~{num}.one"
echo "~{num}.two"
echo "~{num}.three"
>>>
output {
Array[String] out = read_lines(stdout())
}
}
task gather {
input {
Array[Array[String]?] num_array
}
Array[String] flat_array = flatten(select_all(num_array))
command <<<
echo ~{sep=', ' flat_array}
>>>
output {
Array[String] final_array = read_lines(stdout())
}
}
Cromwell
Does Cromwell offer checkpointing? 🔗
Cromwell has call caching instead which accomplishes the same thing. When a task completes successfully, it’s results are capable of being reused if the same task and inputs are run again. Use jaws submit --no-cache
to turn caching off.
Why didn’t call caching work for me? 🔗
Changes to the WDL, the name contents of the inputs.json, or the name of the inputs.json will prevent call-caching.
For example, if you set your task’s runtime attributes using input variables, changes to the values of these variables count as changes to the inputs, resulting in a different hash for the task (the wdl and inputs.json are hashed).
Call caching may have failed if your files are being fed in as String rather than File inputs. The hashes of two identical Files stored in different locations would be the same. The hashes of the String values for the different locations would be different, even though the contents of the file are the same.
Call caching also requires consistency in the outputs of the task, both the count (number of outputs) and the output expressions. If you publish a new version of your WDL that has one extra or one fewer output, it will not be able to benefit from a previously successful run of the same task, even if the inputs are the same.
Why do JAWS jobs fail if filenames contain special characters like ` or ;? 🔗
JAWS jobs fail when input file names contain special characters such as ` (apostrophe) or ; (semicolon). Cromwell, the workflow execution engine used by JAWS, does not handle special characters properly. To avoid failures please don’t use special characters in your filenames.
For example, the following error might occur:
cat ~/cromwell-executions/test_weird_chars/d277a390-4552-490a-8bcf-af02a80c7718/call-file/execution/stderr.submit
~/script: line 52: syntax error near unexpected token `&&'
~/script: line 52: `find . -type d -exec sh -c '[ -z "$(ls -A '"'"'{}'"'"')" ] && touch '"'"'{}'"'"'/.file' \;'
Compute Systems
What flavor of linux do the compute nodes run? 🔗
JAWS makes multiple computing resources available, using various linux distros. Thus, we recommend that a docker container be specified for every task; if not, the default container is Ubuntu.
JAWS
Will my container’s entrypoint script be executed by JAWS? 🔗
JAWS does not execute entrypoint scripts, and users cannot alter this behavior.
The ENTRYPOINT instruction sets the default executable for the container, and any arguments passed to the docker run command are appended to it.
However, Cromwell generates a script file that the container runs instead. This script includes the command specified in the command stanza, with all variables expanded, as well as additional Cromwell-specific instructions. As a result, the container’s entrypoint script is ignored by both Cromwell and the JAWS backend, even if specified.
What should I do if I encounter a timezone offset warning when using JAWS Container? 🔗
If you’re using the JAWS Client Container, you might see a warning similar to the following when running the JAWS commands:
JAWS_USER_CONFIG=~/jaws.conf JAWS_CLIENT_CONFIG=/clusterfs/jgi/groups/dsi/homes/svc-jaws/dori-prod/jaws-prod.conf apptainer run docker://doejgi/jaws-client:latest jaws queue
INFO: Using cached SIF image
/usr/local/lib/python3.11/site-packages/local/utils.py:43: UserWarning: Timezone offset does not match system offset: 0 != -25200. Please check your config files.
warnings.warn(msg)
[]
This warning occurs because of a mismatch between the detected timezone offset and the system’s offset. While this doesn’t affect the functionality of JAWS commands, you can remove the warning by setting the TZ environment variable.
Add the following line to your ~/.bashrc file:
export TZ="America/Los_Angeles"
Reload your ~/.bashrc by running:
source ~/.bashrc
After doing this, the warning should no longer appear, and your system’s timezone will be correctly aligned.
Known Limitations
Outputs outside the cromwell-execution/execution/ directory 🔗
When using JAWS, there are some limitations related to output file handling that users should be aware of. Example Workflow:
version 1.0
workflow example {
call dump { }
output {
File log = write_lines(['foo'])
}
}
task dump {
command <<<
echo
>>>
runtime { docker: "debian:bullseye-slim" }
}
Issue Description
JAWS is unable to identify any outputs outside the cromwell-execution/execution/
directory to transfer to the JAWS Teams directory.
For example, if a file is created in the /tmp
directory, JAWS will not be able to recognize and transfer it, as shown in the example above.
Similarly, if a user identifies a file from the inputs folder as a final ouput, JAWS will also be unable to copy it.
Workaround
To ensure that JAWS correctly identifies and transfers the output files, you should:
Save the string to a file and in the outputs stanza of your workflow.
Explicitly copy the final outputs files to
cromwell-execution/execution/
directory.
Here’s how you can adjust the example workflow:
version 1.0
workflow example {
call dump
output {
File log = dump.log
}
}
task dump {
command <<<
echo foo > out.log
>>>
output {
File log = "out.log"
}
runtime {
docker: "debian:bullseye-slim"
}
}
By following this approach, you ensure that all output files are correctly saved within the cromwell-execution/execution/
directory and are explicitly defined,
allowing JAWS to identify and transfer them back to the user without issues.