HTCondor Backend Configuration Options When Creating WDL

Use the following tables to help figure out how to configure your runtime{} section.

How to Allocate Resources in your Runtime Section

HTCondor is the back end to Cromwell and is responsible for grabbing the appropriatly sized resource from slurm for each wdl-task. HTCondor can determine what resource your task needs from only memory and cpu which is set in the runtime{} section. In fact, memory and cpu have defaults set to “2G” and 1(threads), respectively, so you don’t have to include them but it is advised for reproducibility.

Note

Inside your runtime{} section of the WDL, cpu key should be set to threads and not cpus, despite the name, because HTCondor expects that value to be threads.

Table of available resources

Site

Type

#Nodes

Mem (GB)*

Minutes

#Threads

Perlmutter (NERSC)

Large

3072

492

2865

256

JGI (Lab-IT)

Small

316

46

4305

32

Medium

72

236

4305

32

Large

8

492

4305

32

Dori (Lab-IT)

Large

100

492

4305

64

Xlarge

18

1480

4305

72

Tahoma (EMSL)

Medium

184

364

2865

36

Xlarge

24

1480

2865

36

Defiant

Medium

36

256

1425

128

Memory Overhead

This number is the gigabytes you can actually use because of overhead. For example, on dori, a “large” node is advertized at 512G but since there is overhead, we will reserve 20G and instead ask for 492G in our WDL.

Time Overhead

When Cromwell submits a task, HTCondor manages job scheduling by checking the queue for available resources. The JAWS Pool Manager monitors HTCondor and, when needed, requests new Slurm nodes. Once a compute node is available, HTCondor submits the task.

Due to a slight delay (a few seconds) in resource allocation, we build in a time buffer to ensure jobs get the full requested time. For example, instead of requesting the maximum 48 hours on Perlmutter, we request 47 hours and 45 minutes to account for the delay.