FAQs

Code Snippets to Answer Common WDL Design Problems

OpenWDL provides the WDL functions in specs for version 1.0

Building WDLs

How can I use bash commands that require curly braces? 🔗

If you ever need to use curly braces in bash to strip a suffix ‘txt’ or set a default, consider using the WDL Version 1.0 Specification. In WDL document, the first line should indicate version 1.0. You’ll specify the command section like command <<< >>> instead of using curly braces. Additionally, you’ll need to adjust some other formats. Refer to the Version 1.0 specification link for more details.

command <<<
    # setting a default value in bash
    VAR=${VAR:=25}

    # strip a suffix
    myvar=${somefile%.txt}
>>>
How can I output a file that has been named dynamically as a bash variable 🔗

Bash variables created in the command { } block cannot be seen outside the block, for example, in the output { } section. Therefore, you can write the name(s) of any output files to another file which will be read inside the output { } block.

This is the official WDL way, using glob:

output {
    Array[File] output_bams = glob("*.bam")
}

This is another method:

command{
    echo $lib.bam > list_of_files
 }
 output {
    Array[File] = read_lines("list_of_files")
 }

To see more about read_lines() and other WDL functions, see openwdl/wdl.

Using conditionals 🔗
workflow conditional_example {
  File infile

  call wc as wc_before { input: infile = infile }

  Int num_lines = wc_before.num_lines

  if (num_lines > 10) {
    call truncate { input: infile = infile }
  }

  # This function will return false if the defined() argument is an
  # unset optional value. It will return true in all other cases.
  Boolean has_head_file = defined(truncate.outfile)

  if (has_head_file) {
    call wc as wc_after { input: infile = truncate.outfile }
  }

  # notice the '?' after File. These are required since these files may not exist.
  output {
    File wc_before_file = wc_before.outfile
    File? head_file = truncate.outfile
    File? wc_after_file = wc_after.outfile
  }
}

task wc {
  File infile
  command { wc -l < ${infile} | tee wc.txt }
  output {
    Int num_lines = read_int(stdout())
    File outfile = "wc.txt"
  }
}
How to scatter over arrays and maps 🔗

Although you can scatter over arrays and maps, there is different syntax for each. You can only scatter over an array with this syntax

Array[String] some_array
scatter (e in some_array) {
  String value = some_array[e]
  call some_task {input: value = value}
}

But you can iterate over a map by using the ‘pair’ keyword and then ‘.left’ and ‘.right’ as such

Map[String,String] some_map
scatter (pair in some_map) {
  String key= pair.left
  String value = pair.right # or String val = some_map[key]
  call some_task {input: value = value}
}

You can see working examples for scattering an array and scattering a map.

Custom data structures 🔗

Besides Map, Array, Pair you can create a custom data structure using “struct”. This will be similar to a hash but can contain any combination of data types.

Get Keys from a Map 🔗

As of version 1.0 of the wdl spec, there is no direct way to get an array of Map keys. This will become available in version 1.1. As a work-around for now, you can use the Pair data type instead of Map as follows.

version 1.0
workflow test {
  input {
    Array[Pair[Float,String]] my_map = [(0.1, "mouse"), (3, "cat"), (15, "dog")]
  }

  scatter (pairs in my_map) {
    String keys = pairs.left
  }

  output {Array[String] allouts = keys}
}

Note that the default format for my_map in the WDL is different that in an input.json

{
  "test.my_map": [{"Left": 0.1, "Right": "mouse"}, {"Left": 3, "Right": "cat"}, {"Left": 15, "Right": "dog"}]
}
An example of a scatter/gather model when a scattered task is optional 🔗

You can use this example to see where you need to declare optional varaibles (i.e. Array[Array[String?]]) and how you can use two wdl functions, flatten and select_all to convert an optional variable (Array[Array[String?]) to an Array[String].

See this line:

Array[String] flat_array = flatten(select_all(num_array))
# note that this line is not within any stanza, but between input{} and command<<<>>>

and note that

flatten is used to convert an array-of-an-array to an array.
select_all, used in this example, converts Array[String?] to Array[String].
If you don’t use select_all here, you get the error:
Expected ‘Array[Array[_]]’ but got ‘Array[Array[String]?]’
version 1.0

workflow flatten_it {
    input {
        Boolean try_it = false
        Array[Int] numbers = [1,2,3]
    }

    scatter (num in numbers) {
      if (try_it == true) {
        call do_if_true {
          input: num = num
        }
      }

      # call something_else {}
    }

    call gather  {
      input: num_array = do_if_true.out
    }

   output {
     Array[File] out_array = gather.final_array
   }
}

task do_if_true {
     input {
        Int num
     }

     command <<<
        echo "~{num}.one"
        echo "~{num}.two"
        echo "~{num}.three"
     >>>

     output {
         Array[String] out = read_lines(stdout())
    }
}

task gather {
    input {
        Array[Array[String]?] num_array
    }

    Array[String] flat_array = flatten(select_all(num_array))

    command <<<
        echo ~{sep=', ' flat_array}
    >>>

    output {
       Array[String] final_array = read_lines(stdout())
    }
}

Cromwell

Does Cromwell offer checkpointing? 🔗

Cromwell has call caching instead which accomplishes the same thing. When a task completes successfully, it’s results are capable of being reused if the same task and inputs are run again. Use jaws submit --no-cache to turn caching off.

Why didn’t call caching work for me? 🔗

Changes to the WDL, the name contents of the inputs.json, or the name of the inputs.json will prevent call-caching.

For example, if you set your task’s runtime attributes using input variables, changes to the values of these variables count as changes to the inputs, resulting in a different hash for the task (the wdl and inputs.json are hashed).

Call caching may have failed if your files are being fed in as String rather than File inputs. The hashes of two identical Files stored in different locations would be the same. The hashes of the String values for the different locations would be different, even though the contents of the file are the same.

Call caching also requires consistency in the outputs of the task, both the count (number of outputs) and the output expressions. If you publish a new version of your WDL that has one extra or one fewer output, it will not be able to benefit from a previously successful run of the same task, even if the inputs are the same.

Why do JAWS jobs fail if filenames contain special characters like ` or ;? 🔗

JAWS jobs fail when input file names contain special characters such as ` (apostrophe) or ; (semicolon). Cromwell, the workflow execution engine used by JAWS, does not handle special characters properly. To avoid failures please don’t use special characters in your filenames.

For example, the following error might occur:

cat ~/cromwell-executions/test_weird_chars/d277a390-4552-490a-8bcf-af02a80c7718/call-file/execution/stderr.submit
~/script: line 52: syntax error near unexpected token `&&'
~/script: line 52: `find . -type d -exec sh -c '[ -z "$(ls -A '"'"'{}'"'"')" ] && touch '"'"'{}'"'"'/.file' \;'

Compute Systems

What flavor of linux do the compute nodes run? 🔗

JAWS makes multiple computing resources available, using various linux distros. Thus, we recommend that a docker container be specified for every task; if not, the default container is Ubuntu.

JAWS

Will my container’s entrypoint script be executed by JAWS? 🔗

JAWS does not execute entrypoint scripts, and users cannot alter this behavior.

The ENTRYPOINT instruction sets the default executable for the container, and any arguments passed to the docker run command are appended to it.

However, Cromwell generates a script file that the container runs instead. This script includes the command specified in the command stanza, with all variables expanded, as well as additional Cromwell-specific instructions. As a result, the container’s entrypoint script is ignored by both Cromwell and the JAWS backend, even if specified.

What should I do if I encounter a timezone offset warning when using JAWS Container? 🔗

If you’re using the JAWS Client Container, you might see a warning similar to the following when running the JAWS commands:

JAWS_USER_CONFIG=~/jaws.conf JAWS_CLIENT_CONFIG=/clusterfs/jgi/groups/dsi/homes/svc-jaws/dori-prod/jaws-prod.conf apptainer run docker://doejgi/jaws-client:latest jaws queue
INFO:   Using cached SIF image
/usr/local/lib/python3.11/site-packages/local/utils.py:43: UserWarning: Timezone offset does not match system offset: 0 != -25200. Please check your config files.
warnings.warn(msg)
[]

This warning occurs because of a mismatch between the detected timezone offset and the system’s offset. While this doesn’t affect the functionality of JAWS commands, you can remove the warning by setting the TZ environment variable.

  1. Add the following line to your ~/.bashrc file:

export TZ="America/Los_Angeles"
  1. Reload your ~/.bashrc by running:

source ~/.bashrc

After doing this, the warning should no longer appear, and your system’s timezone will be correctly aligned.

Known Limitations

Outputs outside the cromwell-execution/execution/ directory 🔗

When using JAWS, there are some limitations related to output file handling that users should be aware of. Example Workflow:

version 1.0

workflow example {
  call dump { }
  output {
    File log = write_lines(['foo'])
  }
}

task dump {
  command <<<
    echo
  >>>
  runtime { docker: "debian:bullseye-slim" }
}

Issue Description

JAWS is unable to identify any outputs outside the cromwell-execution/execution/ directory to transfer to the JAWS Teams directory. For example, if a file is created in the /tmp directory, JAWS will not be able to recognize and transfer it, as shown in the example above.

Similarly, if a user identifies a file from the inputs folder as a final ouput, JAWS will also be unable to copy it.

Workaround

To ensure that JAWS correctly identifies and transfers the output files, you should:

  • Save the string to a file and in the outputs stanza of your workflow.

  • Explicitly copy the final outputs files to cromwell-execution/execution/ directory.

Here’s how you can adjust the example workflow:

version 1.0

workflow example {
  call dump
  output {
    File log = dump.log
  }
}

task dump {
  command <<<
    echo foo > out.log
  >>>
  output {
    File log = "out.log"
  }
  runtime {
    docker: "debian:bullseye-slim"
  }
}

By following this approach, you ensure that all output files are correctly saved within the cromwell-execution/execution/ directory and are explicitly defined, allowing JAWS to identify and transfer them back to the user without issues.