FAQs
Code Snippets to Answer Common WDL Design Problems
OpenWDL provides the WDL functions in specs for version 1.0
Building WDLs
How can I use bash commands that require curly braces?
If you ever need to use curly braces in bash to strip a suffix ‘txt’ or set a default, consider using the WDL Version 1.0 Specification.
In WDL document, the first line should indicate version 1.0. You’ll specify the command section like command <<< >>>
instead of using curly braces. Additionally, you’ll need to adjust some other formats. Refer to the Version 1.0 specification link for more details.
command <<<
# setting a default value in bash
VAR=${VAR:=25}
# strip a suffix
myvar=${somefile%.txt}
>>>
How can I output a file that has been named dynamically as a bash variable
Bash variables created in the command { }
block cannot be seen outside the block, for example, in the output { }
section. Therefore, you can write the name(s) of any output files to another file which will be read inside the output { }
block.
This is the official WDL way, using glob:
output {
Array[File] output_bams = glob("*.bam")
}
This is another method:
command{
echo $lib.bam > list_of_files
}
output {
Array[File] = read_lines("list_of_files")
}
To see more about read_lines() and other WDL functions, see openwdl/wdl.
Using conditionals
workflow conditional_example {
File infile
call wc as wc_before { input: infile = infile }
Int num_lines = wc_before.num_lines
if (num_lines > 10) {
call truncate { input: infile = infile }
}
# This function will return false if the defined() argument is an
# unset optional value. It will return true in all other cases.
Boolean has_head_file = defined(truncate.outfile)
if (has_head_file) {
call wc as wc_after { input: infile = truncate.outfile }
}
# notice the '?' after File. These are required since these files may not exist.
output {
File wc_before_file = wc_before.outfile
File? head_file = truncate.outfile
File? wc_after_file = wc_after.outfile
}
}
task wc {
File infile
command { wc -l < ${infile} | tee wc.txt }
output {
Int num_lines = read_int(stdout())
File outfile = "wc.txt"
}
}
How to scatter over arrays and maps
Although you can scatter over arrays and maps, there is different syntax for each. You can only scatter over an array with this syntax
Array[String] some_array
scatter (e in some_array) {
String value = some_array[e]
call some_task {input: value = value}
}
But you can iterate over a map by using the ‘pair’ keyword and then ‘.left’ and ‘.right’ as such
Map[String,String] some_map
scatter (pair in some_map) {
String key= pair.left
String value = pair.right # or String val = some_map[key]
call some_task {input: value = value}
}
You can see working examples for scattering an array and scattering a map.
Custom data structures
Besides Map, Array, Pair you can create a custom data structure using “struct”. This will be similar to a hash but can contain any combination of data types.
Documentation for Custom Type “Struct”.
Example main.wdl && inputs.json.
Get Keys from a Map
As of version 1.0 of the wdl spec, there is no direct way to get an array of Map keys. This will become available in version 1.1. As a work-around for now, you can use the Pair
data type instead of Map
as follows.
version 1.0
workflow test {
input {
Array[Pair[Float,String]] my_map = [(0.1, "mouse"), (3, "cat"), (15, "dog")]
}
scatter (pairs in my_map) {
String keys = pairs.left
}
output {Array[String] allouts = keys}
}
Note that the default format for my_map
in the WDL is different that in an input.json
{
"test.my_map": [{"Left": 0.1, "Right": "mouse"}, {"Left": 3, "Right": "cat"}, {"Left": 15, "Right": "dog"}]
}
An example of a scatter/gather model when a scattered task is optional
You can use this example to see where you need to declare optional varaibles (i.e. Array[Array[String?]]) and
how you can use two wdl functions, flatten
and select_all
to convert an optional
variable (Array[Array[String?]) to an Array[String].
See this line:
Array[String] flat_array = flatten(select_all(num_array))
# note that this line is not within any stanza, but between input{} and command<<<>>>
and note that
flatten
is used to convert an array-of-an-array to an array.select_all
, used in this example, converts Array[String?] to Array[String].version 1.0
workflow flatten_it {
input {
Boolean try_it = false
Array[Int] numbers = [1,2,3]
}
scatter (num in numbers) {
if (try_it == true) {
call do_if_true {
input: num = num
}
}
# call something_else {}
}
call gather {
input: num_array = do_if_true.out
}
output {
Array[File] out_array = gather.final_array
}
}
task do_if_true {
input {
Int num
}
command <<<
echo "~{num}.one"
echo "~{num}.two"
echo "~{num}.three"
>>>
output {
Array[String] out = read_lines(stdout())
}
}
task gather {
input {
Array[Array[String]?] num_array
}
Array[String] flat_array = flatten(select_all(num_array))
command <<<
echo ~{sep=', ' flat_array}
>>>
output {
Array[String] final_array = read_lines(stdout())
}
}
Cromwell
Does Cromwell offer checkpointing?
Cromwell has call caching instead which accomplishes the same thing. When a task completes successfully, it’s results are capable of being reused if the same task and inputs are run again. Use jaws submit --no-cache
to turn caching off.
Why didn’t call caching work for me?
Changes to the WDL, the name contents of the inputs.json, or the name of the inputs.json will prevent call-caching.
For example, if you set your task’s runtime attributes using input variables, changes to the values of these variables count as changes to the inputs, resulting in a different hash for the task (the wdl and inputs.json are hashed).
Call caching may have failed if your files are being fed in as String rather than File inputs. The hashes of two identical Files stored in different locations would be the same. The hashes of the String values for the different locations would be different, even though the contents of the file are the same.
Call caching also requires consistency in the outputs of the task, both the count (number of outputs) and the output expressions. If you publish a new version of your WDL that has one extra or one fewer output, it will not be able to benefit from a previously successful run of the same task, even if the inputs are the same.
Compute Systems
What flavor of linux do the compute nodes run?
JAWS makes multiple computing resources available, using various linux distros. Thus, we recommend that a docker container be specified for every task; if not, the default container is Ubuntu.
JAWS
Will my container’s entrypoint script be executed by JAWS?
JAWS does not execute entrypoint scripts and users cannot modify this behavior.