Speaker
Description
This presentation will present the two basic building blocks of workflows that are the job arrays and job dependencies. Job arrays allow creating parametrised jobs that all look identical except for one parameter that varies through the workflow, while job dependencies enable a fixed ordering of jobs and make sure the steps of the workflows are carried on only when their requirements (input data, software, output directory, etc.) are available. It will also discuss the concepts of micro-scheduling (running multiple small jobs steps inside of a single job allocation) and macro-scheduling (submitting multiple jobs at the same time with a single command). The presentation will also introduce the use of basic GNU/Linux commands that make micro- and macro-scheduling easier: xargs, seq, GNU Parallel, GNU Make, envsubst. The concepts will be illustrated with Slurm but should apply to any other scheduler. Finally, the session will present Maestro, a little workflow manager developed by the same lab as Slurm originated from, that focuses on documentation and organisation, and that makes it easy to build small workflows without the need to manually submit the jobs and is a nice complement to the Linux tools mentioned earlier.