Nextflow
Nextflow is a reactive workflow framework and a programming DSL that eases writing computational pipelines with complex data Nextflow Homepage
Available Modules¶
module load Nextflow/25.10.0
Methods for running Nextflow with REANNZ¶
There are three suggested methods for running Nextflow pipelines on our system:
- Running Nextflow in an interactive Slurm session
- This method is best for setting up or debugging pipeline executions as the pipeline will end as soon as the interactive session ends.
- Submitting a Nextflow workflow as a batch job
- This method will run all sub-processes in the same Slurm job. This is best if your workflow would spawn a large number of short jobs.
- Submitting a Nextflow workflow via a head job
- This method requires submitting a low resource but long running batch job which will control the Nextflow workflow and all processes will be submitted by Nextflow as separate jobs to Slurm. This method is useful for workflows with lots of variation in their computational needs and which comprise mostly long running processes.
The differences in these methods is largely controlled by configuration/profile settings. The examples below use the REANNZ configuration file provided in the Configurations section.
Running Nextflow in an interactive Slurm session¶
Running Nextflow in an interactive session can be especially helpful when setting up or debugging a pipeline. To do so, request an interactive session with appropriate resources for the pipeline:
srun --account nesi12345 --job-name "InteractiveJob" --cpus-per-task 16 --mem-per-cpu 24000 --time 24:00:00 --pty bash
Once your interactive session has launched,
load the Nextflow module with module load Nextflow/25.10.0 (or your required version) and proceed. Parallelization of Nextflow processes will occur within this job.
There are several environmental variables that can be helpful to avoid loading containers and plugins into your persistent project space:
export NXF_APPTAINER_CACHEDIR=/nesi/nobackup/nesi12345/apptainer_cache
export NXF_PLUGINS_DIR=/nesi/project/nesi12345/.nextflow/plugins
You can confirm that Nextflow is loaded and ready using the nextflow run hello command to test Nextflow's Hello World pipeline.
Submitting a Nextflow workflow as a batch job¶
The following sbatch will submit a Nextflow workflow as a Slurm job.
All of the Nextflow processes will run on the same compute node,
so request enough resources to run the most intensive process in the workflow and enough time for the entire workflow to complete.
#!/bin/bash -e
#SBATCH --job-name nextflow-workflow
#SBATCH --account nesi12345
#SBATCH --time 12:00:00
#SBATCH --mem 24G
#SBATCH --cpus-per-task 16
# load Nextflow and set environmental variables
module load Nextflow/25.10.0
export NXF_APPTAINER_CACHEDIR=/nesi/nobackup/nesi12345/apptainer_cache
export NXF_OFFLINE='true'
export NXF_PLUGINS_DIR=/nesi/project/nesi12345/.nextflow/plugins
# run nextflow
nextflow run NEXTFLOW_WORKFLOW \
-profile local,apptainer \
--outdir /nesi/project/nesi12345/NEXTFLOW_WORKFLOW/out \
-w /nesi/nobackup/nesi12345/NEXTFLOW_WORKFLOW/work
Submitting a Nextflow workflow via a head job¶
The following sbatch script will request the resources to run a Nextflow head job which will then submit processes to Slurm as separate tasks.
Beyond the resources requested for this job, the only difference between this script and the previous one is the change in the -profile tag from local,apptainer to slurm,apptainer.
#!/bin/bash -e
#SBATCH --job-name nextflow-head
#SBATCH --account nesi12345
#SBATCH --time 12:00:00
#SBATCH --mem 4G
#SBATCH --cpus-per-task 4
# load Nextflow and set environmental variables
module load Nextflow/25.10.0
export NXF_APPTAINER_CACHEDIR=/nesi/nobackup/nesi12345/apptainer_cache
export NXF_OFFLINE='true'
export NXF_PLUGINS_DIR=/nesi/project/nesi12345/.nextflow/plugins
# run nextflow
nextflow run NEXTFLOW_WORKFLOW \
-profile slurm,apptainer \
--outdir /nesi/project/nesi12345/NEXTFLOW_WORKFLOW/out \
-w /nesi/nobackup/nesi12345/NEXTFLOW_WORKFLOW/work
Avoid many short jobs
This will put a major burden on the Slurm scheduler for no improvement in your computational speed. Do not use the Nextflow slurm executor for jobs which take less than 30 minutes to complete.
Configurations¶
For reproducibility and clarity, we recommend using the ability to
stack Nextflow configurations and use three distinct .config files.
- Pipeline-level config: This configuration is system and data agnostic, but should be untouched for any runs of the given pipeline.
- System-level config: This configuration is pipeline agnostic but provides settings for running on a given computer system. We provide a REANNZ-specific config below.
- Run-level config: This configuration is where changes are made to fine-tune for the specifics of a given run/system/pipeline combination. For clarity, we will refer to this file as
custom.config
Below is an example nextflow.config file with some configuration settings that will help when running Nextflow via REANNZ systems.
// REANNZ nf-core configuration profile
// Global default params, used in configs
params {
config_profile_description = 'REANNZ HPC profile provided by nf-core/configs'
config_profile_contact = 'Jennifer Reeve (@jen-reeve)'
config_profile_url = 'https://docs.nesi.org.nz'
max_cpus = 64
max_memory = 1024.GB
}
// Default settings for all profiles
process {
stageInMode = 'symlink'
cache = 'lenient'
}
// Specific profiles to use in different contexts
profiles {
debug {
// This profile will not remove intermediate files from the work directory
cleanup = false
}
local {
// This profile should be used workflows submitted from interactive Slurm sessions or when a workflow will generate many short (<30 minutes) tasks
process.executor = 'local'
}
slurm {
// This profile should be used for workflows which need to submit processes as Slurm jobs
process.executor = 'slurm'
process.array = 100
}
}
// Settings for the Slurm executor
executor {
'$slurm' {
queueSize = 500
submitRateLimit = '20 min'
// 20 per minute
pollInterval = '30 sec'
queueStatInterval = '30 sec'
jobName = { "${task.process}-${task.hash}" }
queue = 'genoa,milan'
}
}
// Apptainer specific settings
apptainer {
apptainer.pullTimeout = '2h'
}
cleanup = true
Run-level configuration options¶
As mentioned above, the pipeline and system specific configuration files should generally be left untouched. This means any adjustments for your workflow will occur in final run level configuration file.
There are many options you may wish to use to fine-tune your Nextflow runs. For more information, we recommend starting with the overview on configuration files and if needed digging into the configuration reference available in the Nextflow documentation.
One option of particular note is the ability to flag certain processes to use a different executor than the main workflow. This can allow you to either flag certain processes to be submitted as separate jobs despite generally running a workflow in a single Slurm job or to flag processes to run in the head job while most processes are submitted as separate jobs.
For example, if you submit a workflow as a batch job,
but know that several of your individual processes regularly take longer than 30 minutes,
you can flag them to be submitted as additional Slurm jobs separate from the head job.
To do this, give these processes a
label
such as 'slurm_array' and add the following to your custom.config file:
process {
withLabel: slurm_array {
executor = 'slurm'
}
}
You could additionally provide details about the specific resources required,
although this may already be provided by the pipeline level configuration via additional process labels or explicit definitions
using withName.
Please note that there is an additional ranking of priority for process configuration settings beyond that of the layered configuration files.
Monitoring and reporting¶
Nextflow provides tools that can assist you in making efficient use of the HPC resources. We strongly recommend testing your pipeline with a small subset of data to determine optimal settings before running full datasets.
The most human-readable, but least configurable option is the execution report which is an HTML execution report containing CPU and memory utilization information for each individual process as well as each process type. This information can be used to ensure processes are only getting the resources they need.
For a more configurable option, the trace file provides many potential fields of interest which can be requested. A full list of fields is available at the previous documentation link, but several of note for optimization and debugging:
native_idwill provide the job ID for any jobs submitted to Slurmdurationwill show the time from submission of the process to completion of the processrealtimewill show the time from the start of the process to completion of the process (job run time)%cpuwill show the percentage of CPU used by the process%memwill sow the percentage of memory used by the processpeak_rsswill show the peak of real memory usedworkdirwill provide the path to the working directory of the process
Adding the following to your custom.config will produce both an execution report and trace file for each run, named with the timestamp, and put these files in a separate runInfo directory rather than within the Nextflow output directory.
// Name the reports according to when they were run
params.timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss')
// Generate report-timestamp.html execution report
report {
enabled = true
overwrite = false
file = "./runInfo/report-${params.timestamp}.html"
}
// Generate trace-timestamp.txt trace file
trace {
enabled = true
overwrite = false
file = "./runInfo/trace-${params.timestamp}.txt"
fields = 'name,status,exit,duration,realtime,cpus,%cpu,memory,%mem,rss,peak_rss,workdir,native_id'
}
Community pipelines - nf-core¶
nf-core is a global community collaborating to build open-source Nextflow components and pipelines. Many standard pipelines and tools have nf-core pipelines or modules which allow you to skip straight to running the pipeline.
Nextflow plugins
nf-core pipelines expect to use nf-plugins in their base configuration. If you want to use these plugins, you will need to manually download them and store them in a plugin cache directory that you can specify with the NXF_PLUGINS_DIR environmental variable (as in the example .sl above)