FAQs

Code Snippets to Answer Common WDL Design Problems

OpenWDL provides the WDL functions in specs for version 1.0

Building WDLs

How can I use bash commands that require curly braces?

If you ever need to use curly braces in bash to strip a suffix ‘txt’ or set a default, consider using the WDL Version 1.0 Specification. In WDL document, the first line should indicate version 1.0. You’ll specify the command section like command <<< >>> instead of using curly braces. Additionally, you’ll need to adjust some other formats. Refer to the Version 1.0 specification link for more details.

command <<<
    # setting a default value in bash
    VAR=${VAR:=25}

    # strip a suffix
    myvar=${somefile%.txt}
>>>
How can I output a file that has been named dynamically as a bash variable

Bash variables created in the command { } block cannot be seen outside the block, for example, in the output { } section. Therefore, you can write the name(s) of any output files to another file which will be read inside the output { } block.

This is the official WDL way, using glob:

output {
    Array[File] output_bams = glob("*.bam")
}

This is another method:

command{
    echo $lib.bam > list_of_files
 }
 output {
    Array[File] = read_lines("list_of_files")
 }

To see more about read_lines() and other WDL functions, see openwdl/wdl.

Using conditionals
workflow conditional_example {
  File infile

  call wc as wc_before { input: infile = infile }

  Int num_lines = wc_before.num_lines

  if (num_lines > 10) {
    call truncate { input: infile = infile }
  }

  # This function will return false if the defined() argument is an
  # unset optional value. It will return true in all other cases.
  Boolean has_head_file = defined(truncate.outfile)

  if (has_head_file) {
    call wc as wc_after { input: infile = truncate.outfile }
  }

  # notice the '?' after File. These are required since these files may not exist.
  output {
    File wc_before_file = wc_before.outfile
    File? head_file = truncate.outfile
    File? wc_after_file = wc_after.outfile
  }
}

task wc {
  File infile
  command { wc -l < ${infile} | tee wc.txt }
  output {
    Int num_lines = read_int(stdout())
    File outfile = "wc.txt"
  }
}
How to scatter over arrays and maps

Although you can scatter over arrays and maps, there is different syntax for each. You can only scatter over an array with this syntax

Array[String] some_array
scatter (e in some_array) {
  String value = some_array[e]
  call some_task {input: value = value}
}

But you can iterate over a map by using the ‘pair’ keyword and then ‘.left’ and ‘.right’ as such

Map[String,String] some_map
scatter (pair in some_map) {
  String key= pair.left
  String value = pair.right # or String val = some_map[key]
  call some_task {input: value = value}
}

You can see working examples for scattering an array and scattering a map.

Custom data structures

Besides Map, Array, Pair you can create a custom data structure using “struct”. This will be similar to a hash but can contain any combination of data types.

Get Keys from a Map

As of version 1.0 of the wdl spec, there is no direct way to get an array of Map keys. This will become available in version 1.1. As a work-around for now, you can use the Pair data type instead of Map as follows.

version 1.0
workflow test {
  input {
    Array[Pair[Float,String]] my_map = [(0.1, "mouse"), (3, "cat"), (15, "dog")]
  }

  scatter (pairs in my_map) {
    String keys = pairs.left
  }

  output {Array[String] allouts = keys}
}

Note that the default format for my_map in the WDL is different that in an input.json

{
  "test.my_map": [{"Left": 0.1, "Right": "mouse"}, {"Left": 3, "Right": "cat"}, {"Left": 15, "Right": "dog"}]
}
An example of a scatter/gather model when a scattered task is optional

You can use this example to see where you need to declare optional varaibles (i.e. Array[Array[String?]]) and how you can use two wdl functions, flatten and select_all to convert an optional variable (Array[Array[String?]) to an Array[String].

See this line:

Array[String] flat_array = flatten(select_all(num_array))
# note that this line is not within any stanza, but between input{} and command<<<>>>

and note that

flatten is used to convert an array-of-an-array to an array.
select_all, used in this example, converts Array[String?] to Array[String].
If you don’t use select_all here, you get the error:
Expected ‘Array[Array[_]]’ but got ‘Array[Array[String]?]’
version 1.0

workflow flatten_it {
    input {
        Boolean try_it = false
        Array[Int] numbers = [1,2,3]
    }

    scatter (num in numbers) {
      if (try_it == true) {
        call do_if_true {
          input: num = num
        }
      }

      # call something_else {}
    }

    call gather  {
      input: num_array = do_if_true.out
    }

   output {
     Array[File] out_array = gather.final_array
   }
}

task do_if_true {
     input {
        Int num
     }

     command <<<
        echo "~{num}.one"
        echo "~{num}.two"
        echo "~{num}.three"
     >>>

     output {
         Array[String] out = read_lines(stdout())
    }
}

task gather {
    input {
        Array[Array[String]?] num_array
    }

    Array[String] flat_array = flatten(select_all(num_array))

    command <<<
        echo ~{sep=', ' flat_array}
    >>>

    output {
       Array[String] final_array = read_lines(stdout())
    }
}

Cromwell

Does Cromwell offer checkpointing?

Cromwell has call caching instead which accomplishes the same thing. When a task completes successfully, it’s results are capable of being reused if the same task and inputs are run again. Use jaws submit --no-cache to turn caching off.

Why didn’t call caching work for me?

Changes to the WDL, the name contents of the inputs.json, or the name of the inputs.json will prevent call-caching.

For example, if you set your task’s runtime attributes using input variables, changes to the values of these variables count as changes to the inputs, resulting in a different hash for the task (the wdl and inputs.json are hashed).

Call caching may have failed if your files are being fed in as String rather than File inputs. The hashes of two identical Files stored in different locations would be the same. The hashes of the String values for the different locations would be different, even though the contents of the file are the same.

Call caching also requires consistency in the outputs of the task, both the count (number of outputs) and the output expressions. If you publish a new version of your WDL that has one extra or one fewer output, it will not be able to benefit from a previously successful run of the same task, even if the inputs are the same.

Compute Systems

What flavor of linux do the compute nodes run?

JAWS makes multiple computing resources available, using various linux distros. Thus, we recommend that a docker container be specified for every task; if not, the default container is Ubuntu.

JAWS

Will my container’s entrypoint script be executed by JAWS?

JAWS does not execute entrypoint scripts and users cannot modify this behavior.