Using Reference Data in Your WDLs

JAWS offers a place to save large, re-usable, files (i.e NR database) that won’t get copied everytime a WDL is submitted. On Perlmutter, you can create a folder under /global/dna/shared/databases/jaws/refdata/<your-group-name>.

When you add files there, they will automatically be synced to all the JAWS sites (except JAWS-AWS). The files will also be accessible within your WDL by using the /refdata/<your-group-name> path.

Note

Syncing the refdata folder from Perlmutter to all the other compute sites happens every night.

Adding Data to refdata directory

  1. You can create your own folders under /global/dna/shared/databases/jaws/refdata but you need to log into a dtn node (i.e. ssh dtn04) since they are read-only nodes.

  2. Your new folders and files must have permissions for JAWS to read. So set global read perms (i.e. drwxrwsr-x+).

  3. You can copy data from Perlmutter filesystems: 1) global home, 2) global common, 3) and the Community File System(CFS) but not /pscratch (unless using globus).

  4. Globus is the fastest data copy method AND can read data from /pscratch. Use the NERSC Perlmutter => NERSC DTN endpoints.

  5. No symlinks (e.g. latest -> v10.4). Symlinks will not be maintained when the data files are sync’d between sites.

How to use refdata in your WDLs

Use /refdata in your WDLs. You would add to this, the path that you created for your data. For example, if you had a blast command, you would point to the database like: blastn -db /refdata/nt_test/nt where nt_test is the directory with all the index files and nt is the prefix to the index files (i.e. nt.nih).

Hint

In your WDL, the input type for refdata files should be specified as String and not File. Variables specified with File are copied into Cromwell’s working directory, and since /refdata doesn’t exist outside the container, JAWS will fail to validate the path and you’ll get an error.

Example

WDL Example

version 1.0
workflow refdata_wf {
    call task1 { }
}

task task1 {

    command {
      # How to access reference data. The command is being run in a
      # docker container and the path to refdata outside the
      # container is mounted as "/refdata" inside the container. The
      # mounting of which happens in the cromwell config file.

      ls /refdata/nt_test
    }

    runtime {
      docker: "ubuntu:latest"
      cpu: 1
      memory: "1G"
    }

    output { String outfile = stdout() }
}