Creating Docker Containers

Summary

This tutorial describes one way Docker images can be created and used in your WDL. If you are unfamiliar with Docker, please see Docker tutorial or search for the many YouTube tutorials.

Prerequisites

This tutorial page relies on completing the previous tutorial, Lesson 1: Development Environment.

Note

As a pre-requisite, you will need a computer with Docker installed (Docker Engine - Community). Installation instructions can be found at docs.docker.com/install or if you have conda installed conda install -c conda-forge docker-py.

Here are the steps we’re going to take for this tutorial:

make a Docker image from the same commands you used for the conda environment (Lesson 1: Development Environment.);
run a WDL that is using your Docker container.

Clone the Example Repository

For this tutorial, I will be using the example code from jaws-tutorial-examples. To follow along, do:

git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git
cd jaws-tutorial-examples/5min_example

Create docker image

Next we’ll describe how to create a Dockerfile and register it with hub.docker.com. But first create an account and click on “Create a Repository”. In the space provided, enter a name for your container, that doesn’t have to exist yet, like aligner-bbmap. You will push a docker image to this name after you create it in the next steps.

To make the Dockerfile, you can use the same commands you used for the conda environment. Notice that it is good practice to specify the versions when installing software like I have done in the example Dockerfile. Of course, you can drop the versions altogether to get the latest version but the Dockerfile may not work out-of-the-box in the future due to version conflicts.

Note

It is helpful, when creating the Dockerfile to test each command (i.e. apt-get, wget, conda install, etc) manually, inside an empty docker container. Once everything is working, you can copy the commands to a Dockerfile.

This docker command will create an interactive container with an ubuntu base image. You can start installing stuff as root.

docker run -it ubuntu:latest /bin/bash

Here is an example Dockerfile (provided in 5min_example). We will create a container from it.

FROM ubuntu:22.04

# Install stuff with apt-get
RUN apt-get update && apt-get install -y wget bzip2 \
    && rm -rf /var/lib/apt/lists/*

# Point to all the future conda installations you are going to do
ENV CONDAPATH=/usr/local/bin/miniconda3
ENV PATH=$CONDAPATH/bin:$PATH

# Install miniconda
# There is a good reason to install miniconda in a path other than its default.
# The default intallation directory is /root/miniconda3 but this path will not be
# accessible by shifter or singularity so we'll install under /usr/local/bin/miniconda3.
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh \
    && bash ./Miniconda3*.sh -b -p $CONDAPATH \
    && rm Miniconda3*.sh

# Install software with conda
RUN conda install -c bioconda bbmap==38.84 samtools==1.11 \
    && conda clean -afy

# This will give us a workingdir within the container (e.g. a place we can mount data to)
WORKDIR /bbmap

# Move script into container.
# Notes that it is copied to a location in your $PATH
COPY script.sh /usr/local/bin/script.sh

Build the image and upload to hub.docker.com

You need to use your docker hub user name to tag the image when you are building it (see below).

# create a "Build" directory and create docker container from there so its a small image. Its good practice to always create an image in
# a directory containing only the required files, otherwise the container will also include them and could be very large.
mkdir build
cp script.sh Dockerfile build/
cd build
docker build --tag <your_docker_hub_user_name>/aligner-bbmap:1.0.0 .
cd ../

Test that the example script runs in the docker container

# use your image name
docker run <your_docker_hub_user_name>/aligner-bbmap:1.0.0 script.sh

# if you are in the root of the 5min_example directory, then try re-running the script with data.
docker run --volume="$(pwd)/../data:/bbmap" <your_docker_hub_user_name>/aligner-bbmap:1.0.0 script.sh sample.fastq.bz2 sample.fasta

# Notice script.sh is found because it was set in PATH in the Dockerfile and
# the two inputs are found because the data directory is mounted to /bbmap (inside container) where the script runs.

When you are convinced the docker image is good, you can register it with hub.docker.com (remember to make an account first). When you run a WDL in JAWS, the docker images will be pulled from hub.docker.com.

docker login
docker push <your_docker_hub_user_name>/aligner-bbmap:1.0.0

Now your image is available on any site i.e. dori, jgi, tahoma, perlmutter, aws, etc. Although you can manually pull your image using:

shifter pull;

singularity pull; or

docker pull;

JAWS will do this for you (you will need to manually pull the images if you are testing Cromwell locally).

Test your image on Perlmutter

Besides your docker-machine, it is useful to test your image on Perlmutter since you will likely be running your WDL there at some point. There are certain aspects of the docker container that will work on your docker-machine but won’t on another site, like dori. This is because shifter or singularity behave differently than docker.

To test the docker container on perlmutter-p1.nersc.gov. You’ll need to use the shifter command instead of docker to run your workflow, but the image is the same. More about shifter at NERSC.

Example:

# pull image from hub.docker.com
shifterimg pull <your_docker_hub_user_name>/aligner-bbmap:1.0.0

# clone the repo on Perlmutter
git clone https://code.jgi.doe.gov/official-jgi-workflows/wdl-specific-repositories/jaws-tutorial-examples.git
cd jaws-tutorial-examples/5min_example

# run your wrapper script. notice we are running the script.sh that was saved inside the image
shifter --image=<your_docker_hub_user_name>/aligner-bbmap:1.0.0 ./script.sh ../data/sample.fastq.bz2 ../data/sample.fasta

The WDL

The script.sh that is supplied with the repo has two essential commands:

# align reads to reference contigs
bbmap.sh Xmx12g in=$READS ref=$REF out=test.sam

# create a bam file from alignment
samtools view -b -F0x4 test.sam | samtools sort - > test.sorted.bam

It would make sense to have both commands inside one task of the WDL because they logically should be run together. However, for an excersise, we will have the two commands become two tasks. The output from the first command is used in the second command, so in our WDL example, we can see how tasks pass information.

See an example of the finished WDL align_final.wdl and its input.json` file

Note

Singularity, docker, or shifter can be prepended to each command for testing (see align_with_shifter.sh); however, this wouldn’t be appropriate for a finished “JAWSified” WDL because you loose portability. The final WDL should have the docker image name put inside the runtime {} section.

This may be helpful when testing & debugging so I’ve included an example where shifter is prepended to each command.

You would run this WDL on Perlmutter with the following command.

java -jar /global/cfs/cdirs/jaws/jaws-install/perlmutter-prod/lib/cromwell-84.jar run align_with_shifter.wdl -i inputs.json

The Docker Image Should be in the runtime{} Section

Everything in the command{} section of the WDL will run inside a docker container if you’ve added docker to the runtime{} section. Now your WDL has the potential to run on a machine with shifter, singularity, or docker. JAWS will take your docker image and run it appropriately as singularity, docker or shifter. If you run the WDL with the cromwell command on a shifter or singularity machine, you need to supply a cromwell.conf file, explained shortly.

See align_final.wdl:

runtime {
    docker: "jfroula/aligner-bbmap:2.0.3"
}

Run the Final WDL with Cromwell

On a Docker machine

You can now run the final WDL:

conda activate bbtools  # you need this for the cromwell command only
cromwell run align_final.wdl -i inputs.json

On Perlmutter

You’ll have to include a cromwell.conf file in the command because it is the config file that knows whether to run the image, supplied in the runtime{} section, with docker, singularity, or shifter. You don’t need to supply a cromwell.conf file in the above cromwell command because docker is default.

The cromwell.conf file is used to:

override cromwell’s default settings
tells cromwell how to interpret the WDL (i.e. use shifter, singularity, etc)
specifies the backend to use (i.e. local, slurm, aws, condor, etc)

Note

JAWS takes care of the cromwell.conf for you.

Here you can find the config files: jaws-tutorials-examples/config_files.

java -Dconfig.file=<repository-root>/config_files/<cromwell_*.conf> \
     -Dbackend.providers.Local.config.dockerRoot=$(pwd)/cromwell-executions \
     -Dbackend.default=Local \
     -jar <path/to/cromwell.jar> run <wdl> -i <inputs.json>

where

-Dconfig.file
points to a cromwell conf file that is used to overwrite the default configurations.  There are versions for perlmutter, dori, etc.

-Dbackend.providers.Local.config.dockerRoot
this overwrites a variable ‘dockerRoot’ that is in cromwell_perlmutter.conf so that cromwell will use your own current working directory to place its output.

-Dbackend.default=[Local|Slurm]
this will allow you to choose between the Local and Slurm backends. With slurm, each task will have it’s own sbatch command (and thus wait in queue).

cromwell.jar can be what you installed or you can use these paths:
dori: /clusterfs/jgi/groups/dsi/homes/svc-jaws/jaws-install/dori-prod/lib/cromwell-84.jar
perlmutter: /global/cfs/cdirs/jaws/jaws-install/perlmutter-prod/lib/cromwell-84.jar

Understanding the Cromwell Output

Cromwell output is:

files created by the workflow
the stdout/stderr printed to screen

1. Where to find the output files

Cromwell saves the results under a directory called cromwell-executions. And under here, there is a unique folder name representing one WDL run.

Each task of your workflow gets run inside the execution directory so it is here that you can find any output files including the stderr, stdout & script file.

Explaination of cromwell generated files

These files are only seen in JAWS

2. Cromwell’s stdout

When you ran align_with_shifter.wdl with cromwell above, observe these lines in the output.

the bash bbmap.sh and samtools commands that were run
paths to the output files from the workflow
you should see WorkflowSucceededState
copy a path from one of the output execution directories. Notice the cromwell generated files and your .sam or .bam output is there.
Call-to-Backend shows that we are running on local backend (default)

Note

You won’t have access to this same cromwell standard output when you run through JAWS. The same information can be found in different ways.

Limitations when using docker

One docker image per task - this is a general constraint that Cromwell has.
The docker image must be registered with docker hub - this is how we have set up the docker backend configuration.

A sha256 tag must be used instead of some custom tag (i.e v1.0.1) for call-caching to work.

To find the sha256 tag, you can use:

# on a docker-machine
docker images --digests | grep <your_docker_hub_user_name>

# on a shifter-machine
shifterimg lookup ubuntu:16.04

The version tag (16.04) can be replaced by the sha256 tag.

runtime {
    docker: "ubuntu@sha256:20858ebbc96215d6c3c574f781133ebffdc7c18d98af4f294cc4c04871a6fe61"
}

You can interactively go into a container from shifter by

shifter --image=id:20858ebbc96215d6c3c574f781133ebffdc7c18d98af4f294cc4c04871a6fe61
or
shifter --image=ubuntu:16.04