Best Practices for Creating WDLs

There are opportunities to participate in code reviews with other WDL developers. ContactUs


set -euo pipefail

The set -euo pipefail command is actually a composition of three tests.

For example: - use set -e to make your script exit when a command fails. - use set -u to exit when your script tries to use undeclared variables. - use set -o pipefail in scripts to catch failures in “cat myfile” in e.g. “cat myfile | grep id”. Instead of the successful error code from grep id getting returned, we get a non-zero exit code from cat myfile - use set -x to trace what gets executed. Useful for debugging.

set -euo pipefile can be useful when used as the first line within the command <<< >>> section of a WDL task. This command will help capture errors at the point where they occur in your unix code, rather than having the commands run beyond where the error happened, since this makes debugging more difficult. Another way of saying it is that, without set -e, the wdl-task will use the error code from the last command even if an ealier command failed. However, the set -euo pipefail command can cause the task to exit without any error printed stderr, so it is not always appropriate to use.

Use Docker containers with SHA256 instead of tags
  • The running environment and required scripts should be encapsulated in a docker image.

  • The image should be pushed to hub.docker.com and have a versioned Dockerfile. JAWS will pull images from there by default.

  • We recommend that a docker container be specified for every task; if not, the default container is Ubuntu.

  • It is recomended to reference containers by their SHA256 instead of tag (e.g. doejgi/bbtools@sha256:64088.. instead of doejgi/bbtools:latest) for reproducability (a container can change and have the same tag).

SHA Example
# call-caching will not work
runtime { "docker: ubuntu:20.04" }

# call-caching will work
runtime { "docker: ubuntu@sha256:47f14534bda344d9fe6ffd6effb95eefe579f4be0d508b7445cf77f61a0e5724" }

# find the sha
docker pull ubuntu:20.04
Digest: sha256:47f14534bda344d9fe6ffd6effb95eefe579f4be0d508b7445cf77f61a0e5724

# or
docker inspect --format='{{.RepoDigests}}' ubuntu:20.04
ubuntu@sha256:47f14534bda344d9fe6ffd6effb95eefe579f4be0d508b7445cf77f61a0e5724
Avoid hard-coding paths in the WDL

Paths to files or directories should be put into the inputs.json file, not the WDL. The exeption to this rule are docker images which should be hard-coded so the WDL contains information about the version of the docker container.

WDL tasks should be self-sufficient
  1. Imagine the WDL task as a wrapper script, it should be able to run independently of the pipeline. This means that a script should explicitly list all required input files as arguments and not assume some input files already exist in the current working directory.

  2. Scripts should also specify output files as arguments and shouldn’t write them somewhere other than the current working directory if they will be needed for the next task. These rules make writing the WDL trivial.

  3. The WDL should be expected to handle minimal logic. Use wrapper scripts to deal with logic if need be.

  4. Also, scripts should return a code of 0 if it was successfull. And don’t write anything but errors to stderr. Cromwell depends on seeing a return code of 0 on success and JAWS depends on seeing errors written to stderr. Sometimes, scripts write errors to stdout and these will be missed if you try and see the errors via erros.json supplementary files created by JAWS.

Example
# This explicitly lists all input files, and output file.
filterFastq.py in=${fastq} ref=${refdata} huseq=${hu_fasta} out=myout


# This script expects the files to exist implicitly
filterFastq.py ref=${refdata}
Use subworkflows

Consider using subworkflows if organizing tasks that way makes the main workflow more understandable, reusable, and maintainable. Even a single task can be its own workflow. Subworkflows are imported and used as if they were normal tasks, see the example below that was copied from Cromwell documentation.

Example
# main.wdl

import "sub_wdl.wdl" as sub

workflow main_workflow {

    call sub.hello_and_goodbye { input: hello_and_goodbye_input = "sub world" }

    # call myTask { input: hello_and_goodbye.hello_output }

    output {
        String main_output = hello_and_goodbye.hello_output
    }
}
# sub_wdl.wdl

workflow hello_and_goodbye {
String hello_and_goodbye_input

call hello {input: addressee = hello_and_goodbye_input }
call goodbye {input: addressee = hello_and_goodbye_input }

output {
    String hello_output = hello.salutation
    String goodbye_output = goodbye.salutation
  }
}

task hello {
    String addressee
    command <<<
        echo "Hello ${addressee}!"
    >>>
    output {
        String salutation = read_string(stdout())
    }
}

task goodbye {
    String addressee
    command <<<
        echo "Goodbye ${addressee}!"
    >>>
    output {
        String salutation = read_string(stdout())
    }
}
Documenting your WDLs

The best way to document your WDLs is with a README.md that is in the same repository as the WDL. However, adding “metadata” sections in the WDL is also best practice since you will hard-code some relevant information this way, like author, contact info, etc. See the WDL template as an example.

Build Docker Images Through CI/CD

Do you want docker images to get re-build everytime you push to a gitlab repository? Here is an example of how to set up a pipeline so docker images can be automatically built and pushed to hub.docker.com everytime you make a change to the repo code. Details can be found in the readme.

Install bash to Docker Image

Bash is required in your Docker image for running JAWS. For example, if you based Docker image is Ubuntu, Bash is already available. However, Alpine Docker image does not have bash installed by default. You will need to add the following commands to get bash:

RUN apk update && apk add bash
requirements for call-caching

Here are some reasons why call-caching may not have worked…

  • Call caching requires consistency in the inputs of the task. Make sure there were not any changes to a task’s input{} section, including any variable values.

  • Call caching may have failed if your files are being fed in as a String rather than File inputs. The hashes of two identical files stored in different locations would be the same. The hashes of the String values for the different locations would be different, even though the contents of the file are the same.

  • Call caching also requires consistency in the output{} section of the task. If there was a change in the content of an output file, but the name was the same, call-caching will still happen (i.e. when there are un-deterministic outputs).

  • Call caching also requires consistency in the command{} section of the task.

  • Changing runtime{} values that are hard-coded will not prevent call-caching, except for docker (and ContinueOnReturnCode, FailOnStderr but these are not accepted in jaws runtimes). Remember that changing runtime variables such as memory or cpu using task inputs will break call caching since this is registered as a change to the inputs for the task.

  • Don’t include use conditional statements in a task except within the command{} section. It’s not the conditional itself, but the cromwell variables used within the conditional statement that prevent call-caching. Interestingly, miniwdl will cache task_two fine.

example of bad conditional (see task_two)
version 1.0

workflow test_call_cache {
    call task_one {
    }

    call task_two {
        input:  single = task_one.single
    }
}

task task_one {
    command <<<
      echo single reads > single.txt
    >>>

    output {
        File single = "single.txt"
    }
        }

task task_two {
    input {
        File single
        Boolean isSingleEnd = true
    }

    # this is the offending line. The solution is to put any conditionals in the command{} section.
    String reads_input_flag = if(isSingleEnd) then "-U ~{single}" else "-1 ~{single}"

    command <<<
        # this commented code is the fix
        # if [[ ~{isSingleEnd} ]]; then
        #     reads_input_flag="-U ~{single}"
        # else
        #     reads_input_flag="-1 ~{single}"
        # fi
        echo ~{reads_input_flag}
    >>>
}

References

Templates

WDL Best Practices Template
# By versioning your WDL, you specify which specification cromwell uses to decifer the WDL.
# New features come with new versions.
version 1.0

# import any subworkflows
import "subworkflow.wdl" as firstStep

workflow bbtools {
    meta {
        authors: [
            {
                name: "Jackson Brown"
                email: "jbrown@my-inst"
                organization: "JGI"
            }
        ]
        version: "2222.2.0"
        notes: "this is the official release version"
    }

    # you must have this input section within the "workflow" stanza if you are using version 1
    input {
        File reads
        File ref
        String docker_image = "jfroula/bbtools@sha256:cf560d21149237feff9210b0cd831dcc532ebdccaaa3f5ede52369f45a23e768"
    }

    call firstStep {
      input: fastq=reads,
             docker_image=docker_image
    }

    call alignment {
       input: fastq=reads,
              fasta=ref,
              docker_image=docker_image
    }

    call samtools {
       input: sam=alignment.sam
   }
}

#
# below are task definitions
#
task alignment {
    # Metadata is good for helping the next guy understand your code.
    # This meta section can also be used for documentation generated by wdl-aid.
    # You can run "wdl-aid <workflow.wdl>" if it is installed, see https://wdl-aid.readthedocs.io/en/latest/usage.html)
    meta {
        metaParameter1: "Some meta Data I"
        metaParameter2: "Some meta Data II"
        description: "Add a brief description of what this task does in this optional block. One can add as much text as one wants in this section to inform an outsider to understand the mechanics of this task."
    }

    input {
        File fastq
        File fasta
        String docker_image
    }

    command <<<
        # Use this command to help debug your bash code (i.e. prevents hidden bugs).
        # For a description, see https://gist.github.com/mohanpedala/1e2ff5661761d3abd0385e8223e16425
        set -euo pipefail

        # Note that ~{} is prefered over the old ${} syntax
        bbmap.sh in=~{fastq} ref=~{fasta} out=test.sam
    >>>

    runtime {
        docker: docker_image
        cpu: 8
        memory: "5GB"
        runtime_minutes: 120
    }

    output {
       File sam = "test.sam"
    }

    # This section is optional and used to create documentation using the wdl-aid tool.
    # see https://wdl-aid.readthedocs.io/en/latest/usage.html
    # You can run "wdl-aid <workflow.wdl>" if it is installed.
    parameter_meta {
        fastq: {description: "file containing reads", category: "required"}
        fasta: {description: "file containing referenece sequences", category: "required"}
        docker_image: {description: "docker image containing BBTools", category: "required"}
    }

}
Dockerfile template
FROM ubuntu:22.04

# install stuff with apt-get
RUN apt-get update && apt-get install -y wget bzip2 \
&& rm -rf /var/lib/apt/lists/*

# install miniconda
# There is a good reason to install miniconda in a path other than its default.
# The default intallation directory is /root/miniconda3 but this path will not be
# accessible by shifter or singularity so we'll install under /usr/local/bin/miniconda3.

ENV CONDAPATH=/usr/local/bin/miniconda3
RUN wget https://repo.continuum.io/miniconda/Miniconda3-4.5.11-Linux-x86_64.sh \
&& bash ./Miniconda3*.sh -b -p $CONDAPATH \
&& rm Miniconda3*.sh

# point to all the future conda installations you are going to do
ENV PATH=$CONDAPATH/bin:$PATH

# Install stuff with conda
# Remember to use versions of everything you install with conda as shown in example.
RUN conda install -c bioconda bbmap==38.84 samtools==1.11 && conda clean -afy


# copy bash/python scripts specific to your pipeline
COPY scripts/* /usr/local/bin/

Additional helpful notes when building Docker images:

  • The dockerfile template uses the strategy of installing miniconda so you can use conda install for probably, most of your tools. However, pip install and apt-get install work in addition to, or instead of miniconda.

  • Also, remember to use versions of everything you install with conda as shown in above docker template example.

  • There is a good reason to install miniconda in a path other than its default. The default installation directory is /root/miniconda3 but this path will not be accessible by shifter or singularity.

  • When you build your docker image (i.e. docker build --tag <somename> -f ./Dockerfile3 .), all files in the current directory (and sub-directories) are transfered to the local docker daemon. This transfer step can be time consuming. Therefore, docker builds should be performed in a directory without extraneous files.

  • One helpful thing you can do when developing docker images is to create a bare essentials image with your favorite editor installed (i.e. vim). Then you can go into the container interactively docker run --it <image> and see if you can install stuff manually, then just copy those same commands into the final dockerfile.

For more see the docker official docs on best practices.