Write a pipeline
Introduction
The Mantle Pipelines feature allows you to run Nextflow pipelines within Mantle, without using the command line. When you upload a pipeline to Mantle, you specify the input configuration, which is turned into a graphical user interface within Mantle. In addition to the typical input types (strings, integers, floats, and Booleans), you can also use files and Mantle datasets as inputs. Within the scripts in your pipeline, you can use the Mantle SDK to access Mantle dataset inputs. Furthermore, you can use the SDK to create output datasets, and to link output files and datasets to the pipeline to ensure that you always know your data’s lineage.
Existing Nextflow pipelines can be adapted to be compatible with Mantle by writing two new Nextflow processes and adding them to your pipeline. The first is for pre-processing, in which you will use the Mantle SDK to get the Mantle dataset inputs of the pipeline and download the associated files. The second is for post-processing, in which you will create any dataset outputs, as well as register output files and datasets to the pipeline. We provide a Docker image for the Mantle SDK that can be used with these two processes.
Nextflow
If you need help getting started with Nextflow, you may find the Jupyter to Nextflow guide to be useful.
For more information on writing a Nextflow pipeline, see the Nextflow documentation.
Nextflow also has a large community building open source pipelines that can be found on nf-core.
Mantle Default Inputs
When Mantle kicks off a Nextflow pipeline, it adds the following inputs:
pipeline_run_id
: The ID of the pipeline that is being run.outdir
: The directory where the outputs of the pipeline should be written.
We provide examples below for how to use these parameters in your pipeline.
Nextflow Configuration
Mantle provides a base configuration file to run your pipeline that you can augment with your own nextflow.config
file.
If you installed the AWS CLI on your AMI when setting up your Batch environment for Mantle, you will need to include the following block to your configuration file:
If you followed the instructions in the Set Up guide, you can use the above block as is.
If you are unsure of the cliPath; run which aws
within the EC2 instance that generated the AMI.
We run your pipeline with the following configuration.
Using Docker
Mantle recommends using Docker to ensure that your pipeline is reproducible by running in a consistent environment.
To use Docker in your pipeline, you need to specify the Docker image to use in either your nextflow.config
file or in each process
block of your Nextflow module.
Make sure that the Docker image is built for the architecture of your AMI.
Here is an example of how to specify a Docker image in your nextflow.config
file:
Here is an example of how to specify a Docker image in an individual Nextflow process:
Pushing Docker containers to AWS ECR
If you are using AWS, you can push your Docker containers to ECR and use them in your pipeline.
Pre-requisites:
- You need to have the AWS CLI installed on your machine.
- You need to have an ECR repository set up.
- You need to have the AWS CLI configured with the correct credentials.
Here is an example of how to push a Docker container to ECR:
Accessing Inputs Within a Pipeline
When you upload a pipeline to Mantle, you specify the inputs and their types using the input configuration.
To access Mantle dataset inputs in a script within your pipeline, use the Mantle SDK (see below for more information).
All other inputs are stored as Nextflow parameters. For example, if your pipeline takes an input named classification_model
of type string
,
you can access the value using params.classification_model
.
Using the Mantle SDK in a Nextflow Process
Passing variables to the SDK
In order to use the Mantle SDK within a Nextflow module, you need to add the following to your workflow step:
The USER
and PASSWORD
are used for authentication, and are passed in by Mantle via Nextflow secrets to your pipeline.
The pipelineId
is used to identify the pipeline that is being run, and is passed in via the input variable.
Within your script, you can use the SDK functions to interact with the pipeline run.
SDK functions
For more information on functions available in the Mantle SDK, see the Mantle SDK documentation.
Updating an Existing Pipeline
Existing Nextflow pipelines can be adapted to be compatible with Mantle by writing two new Nextflow processes and adding them to your pipeline. The first is for pre-processing, in which you will use the Mantle SDK to get the Mantle dataset inputs of the pipeline and download the associated files. The second is for post-processing, in which you will create any dataset outputs, as well as register output files and datasets to the pipeline.
The recommended workflow looks like this:
Pre-processing
Adding this process ensures that the pipeline has access to the necessary input data and that it is organized appropriately for processing. We will use the Mantle SDK to get the input Mantle datasets and stage the associated files in the current working directory.
Here is an example for a pipeline that takes FASTQ files as input:
Nextflow process
Python script
Post-processing
This process is responsible for registering the outputs of the pipeline with Mantle. In this step, it is recommmended to use the Mantle SDK to register the outputs of the pipeline. This will allow Mantle to track the outputs of the pipeline and allow users to view them in the Mantle interface.
Here is an example of registering output files to a pipeline:
Nextflow process
Python script
Using a Monorepo
Mantle runs Nextflow pipelines via a Github integration, which requires a single main.nf
file to run.
To have multiple pipelines within a single repository, you need to specify a single main.nf
file that imports the other pipelines and defines their entrypoints.
Here is an example of a main.nf
file that imports two pipelines and defines their entrypoints:
Each named workflow can be run in mantle as a separate pipeline.