-
Notifications
You must be signed in to change notification settings - Fork 0
Usage
This page describes how to run the benchmarking pipeline.
You can run the pipeline using the nextflow run
command, however, for the full evaluation to be run successfully several arguments should be set.
Here is an example of a full command for running the pipeline. See the sections below for a description of the arguments.
nextflow run main.nf \
-entry WF_ALL \ # Use the WF_ALL entry point
-profile hmgu_slurm \ # Use the HMGU Slurm profile
-params-file conf/full-analysis.yml \ # Use the parameters for the full analysis
-resume \ # Used cached results where available
-with-tower # Monitor progress with NextFlow Tower
The pipeline is constructed of several sub-workflows which can be selected using the -entry
argument.
This allows different steps to be run separately which can be useful during development.
If you attempt to run the pipeline without setting this argument you will see a message listing the possible entry points.
This option tells the pipeline which configuration profile to use.
You can either supply a configuration file directly using the -c
argument or select a profile defined in nextflow.config
with -profile
.
Parts of the pipeline are computationally intensive and, for real datasets, need to be performed in a High Performance Computing (HPC) environment.
Thankfully, when provided with a suitable configuration file, Nextflow can take care of scheduling and submitting jobs for us.
An example configuration for the HMGU Slurm cluster is provided in conf/
but you should also refer to the Nextflow configuration documentation when creating a file for your system.
NOTE
The example profile configuration contains labels for defining resource levels, however because NextFlow does not allow these to be combined with adaptive resources based on input size they are not used for most processes. Instead we define adaptive memory requirements in the process definitions. To modify these you will need to edit the workflow files. We recognise this setup is less transferrable but it was the best compromise for our computing environment.
For use during development the default pipeline contains only a small subset of datasets, methods and metrics.
To run a more complete analysis and set other configuration options a parameters file is required.
A parameter configuration file for running the full analysis is included in conf/
.
The -resume
option tells Nextflow to reuse cached results from previous runs where appropriate.
This avoids recomputing results where they haven't changed and is useful during development or if a process fails for whatever reason.
However, Nextflow cannot always track all dependencies so it is always safest to rerun everything to make sure the results are complete and up-to-date.
If you have configured Nextflow Tower you can set this option to activate a web interface for monitoring the progress of the pipeline. Nextflow Tower is also a paid computing environment for running workflows remotely but this monitoring functionality is free and can be set up using these instructions.