AWS

In order for Mantle to organize your data and run pipelines within your AWS account, you will need to set up a few things.

Pre-requisites

  • An AWS account
  • S3 buckets storing your raw data

1. S3 Buckets

Determine which S3 buckets you want to use for your Mantle environment. You will need to specify these buckets within the IAM policy you create for Mantle.

Mantle uploads all output files to a single S3 bucket. It is recommend to create a new bucket for Mantle to use. This bucket should be in the same region as the Batch environment you create. Please grant both read and write permissions to Mantle for this bucket (see below).

If you want to use a bucket that you already have, we will need to specifiy a prefix for Mantle to use. This will help keep Mantle’s files organized within your bucket.

Provide the bucket name and prefix to Mantle to set up your account.

Bucket CORs Policy

For the mantle write bucket add the following CORS policy to allow Mantle to upload files to the bucket:

[
    {
        "AllowedHeaders": ["*"],
        "AllowedMethods": [
            "GET",
            "PUT",
            "POST"
        ],
        "AllowedOrigins": [
            "{your_mantle_domain}.app.mantlebio.com"
        ],
        "ExposeHeaders": []
    }
]

For any read buckets you want to use with Mantle, add the following CORS policy to allow Mantle to read files from the bucket:

[
    {
        "AllowedHeaders": ["*"],
        "AllowedMethods": [
            "GET"
        ],
        "AllowedOrigins": [
            "{your_mantle_domain}.app.mantlebio.com"
        ],
        "ExposeHeaders": []
    }
]

2. Create IAM Users

Mantle User

  1. Go to the IAM Console and navigate to “Policies” in the left-hand menu.
  2. Click “Create Policy” and select the “JSON” tab.
  3. Use the following JSON:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListAllMyBuckets",
                "s3:ListBucket"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
            ],
            "Resource": [
                "arn:aws:s3:::{read_bucket_name}",
                "arn:aws:s3:::{read_bucket_name}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:Put*",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::{read_write_bucket_name}",
                "arn:aws:s3:::{read_write_bucket_name}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "batch:DescribeJobQueues",
                "batch:CancelJob",
                "batch:SubmitJob",
                "batch:ListJobs",
                "batch:DescribeComputeEnvironments",
                "batch:TerminateJob",
                "batch:DescribeJobs",
                "batch:DescribeJobDefinitions",
                "batch:RegisterJobDefinition",
                "batch:TagResource"
            ],
            "Resource": [
                "*"
            ]
        },
  {
      "Effect": "Allow",
      "Action": [
              "logs:GetLogEvents"
      ],
      "Resource": [
              "arn:aws:logs:us-west-2:{account_id}:log-group:/aws/batch/job:*"
      ]
  }
    ]
}

This policy allows the IAM user to read from the S3 buckets you specify and write to the Mantle output bucket. It also allows the user to submit batch jobs. Nextflow requires batch actions to be performed on "Resource": [ "*" ]. If your pipeline relies on Docker images stored in ECR, you will need to provide Mantle with access to the images. This can be done by adding the following snippet to your IAM policy. Futher instructions for how to use containers in your Nextflow pipeline can be found here.

Note: We are testing different configurations to reduce the number of permissions we need. It may be sufficient in the future to ensure your AWS Batch role has these permissions rather than giving them to the Mantle user — we will update the documentation.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:DescribeImages",
                "ecr:BatchGetImage",
                "ecr:GetLifecyclePolicy",
                "ecr:GetLifecyclePolicyPreview",
                "ecr:ListTagsForResource",
                "ecr:DescribeImageScanFindings"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
  1. Click “Review Policy” and give it a name, such as “MantlePolicy”.
  2. Navigate to “Users” in the left-hand menu and click “Add user”.
  3. Give the user a name, such as “MantleUser”, and select “Programmatic access”.
  4. Click “Next: Permissions” and attach the policy you just created.
  5. Click “Next: Tags” and “Next: Review”.
  6. Click “Create user” and save the access key and secret key in a secure location.
    If you navigate away before saving you will need to create a new access key.

Provide the access key and secret key to Mantle to set up your account.

Batch Role

To use EBS auto-scaling you will need to create a role granting appropriate access.

  1. Navigate to IAM Console and click Policies in the left side bar
  2. Click “Create Policy”
  3. Select JSON
  4. Paste the following in the “Policy editor”:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AttachVolume",
                "ec2:DetachVolume",
                "ec2:DescribeVolumeStatus",
                "ec2:DescribeVolumes",
                "ec2:DescribeTags",
                "ec2:ModifyInstanceAttribute",
                "ec2:DescribeVolumeAttribute",
                "ec2:CreateVolume",
                "ec2:DeleteVolume",
                "ec2:CreateTags",
                "ssm:GetRoleCredentials",
                "ssm:UpdateInstanceInformation",
                "ssm:ListInstanceAssociations",
                "ssmmessages:*",
                "ec2messages:*"
            ],
            "Resource": "*"
        }
    ]
}
  1. Name the policy amazon-ebs-autoscale-policy-nextflow
  2. Click “Create Policy”
  3. Attach this new policy in the next step.

Each job within the batch queue has the access scoped to an IAM role, to set up a role with the access required to the batch queue follow the steps below:

  1. Go to the IAM Console and navigate to “Roles” in the left-hand menu.
  2. Click “Create role”.
  3. Select “AWS service” and “EC2” as the service that will use this role.
  4. Click “Next”.
  5. Attach the following policies:
  • AmazonEC2ContainerServiceforEC2Role
  • AmazonS3FullAccess (see below for a more restrictive policy, if desired)
  • amazon-ebs-autoscale-policy-nextflow
  1. Click “Next” and give the role a name, such as “MantleBatchRole”.
  2. Click “Create role”.

For more restrictive S3 Access: 8. Click on the role you just created. 9. Click “Add inline policy”. 10. Click “JSON” and use the following policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{read_bucket_name}",
                "arn:aws:s3:::{read_bucket_name}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:Put*",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::{read_write_bucket_name}",
                "arn:aws:s3:::{read_write_bucket_name}/*"
            ]
        }
    ]
}

3. Launch Template

In order for Mantle to run Nextflow pipelines on AWS Batch, you will need to create a custom AMI with the following software installed:

  • aws cli (installed through miniconda)
  • docker

We recommend only installing these two packages on the AMI to keep it lightweight and reduce the time it takes to launch an instance. All other dependencies should be intsalled within the docker container that runs the pipeline.

Bioinformatics pipelines can process a large amount of data because of this we recommend using “EBS autoscaling”. This monitors a mount point on an instance and dynamically increases the available storage based on predefined capacity thresholds. Setting this up involves installing a few lightweight dependencies and a simple daemon on the host instance. More informtion on ebs-autoscaling can be found here.

Create a launch template with autoscaling EBS

Here are the instructions for configuring an EC2 instance with EBS autoscaling:

  1. Go to the EC2 Console
  2. Click on “Launch Templates” (under “Instances”)
  3. Click on “Create launch template”
  4. Under “Launch template name and description” a. Name it something meaningful.
  5. For “Application and OS Images (Amazon Machine Image)” a. Type “Amazon ECS-Optimized Amazon Linux 2 AMI” in the search bar and select the latest version.
  6. Leave “Instance Type” as “Don’t include in launch template”
  7. Leave “Key pair (login)” as “Don’t include in launch template”
  8. Leave “Network settings” as “Don’t include in launch template”
  9. Under “Storage volumes” a. Click “Add new volume” - this will add an entry called “Volume 3 (custom)”
    1. Set Size to 100 GiB
    2. Set Delete on termination to Yes
    3. Set Device name to /dev/xvdba
    4. Set Volume type to General purpose SSD (gp2)
    5. Set Encrypted to Yes

The Different Volumes are described below:

  • Volume 1 is used for the root filesystem. The default size of 8GB is typically sufficient.
  • Volume 2 (the one you created above) will be used for job scratch space. This will be mapped to /var/lib/docker which is used for container storage - i.e. what each running container will use to create its internal filesystem.
  1. Expand the “Advanced details” section a. Add the following script to “User data” and leave the rest of the fields as default:
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="==BOUNDARY=="
    
    --==BOUNDARY==
    Content-Type: text/cloud-config; charset="us-ascii"
    
    packages:
    - jq
    - btrfs-progs
    - sed
    - wget
    - bzip2
    - unzip
    - lvm2
    - git
    # add more package names here if you need them
    
    runcmd:
    - cd $HOME
    - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
    - bash Miniconda3-latest-Linux-x86_64.sh -b -f -p $HOME/miniconda
    - $HOME/miniconda/bin/conda install -c conda-forge -y awscli
    - rm Miniconda3-latest-Linux-x86_64.sh
    - echo "export PATH=$PATH:/miniconda/bin" >> ~/.bashrc
    - source ~/.bashrc
    - ln -sf /miniconda/bin/* /usr/local/bin/
    - EBS_AUTOSCALE_VERSION=$(curl --silent "https://api.github.com/repos/awslabs/amazon-ebs-autoscale/releases/latest" | jq -r .tag_name)
    - echo $EBS_AUTOSCALE_VERSION > /var/log/ebs-autoscale-version.log
    - cd /opt && git clone https://github.com/awslabs/amazon-ebs-autoscale.git
    - cd /opt/amazon-ebs-autoscale && git checkout $EBS_AUTOSCALE_VERSION
    - cd /opt/amazon-ebs-autoscale && sh install.sh -d /dev/xvdba -m /var/lib/docker 2>&1 > /var/log/ebs-autoscale-install.log
    - systemctl restart docker
    
    --==BOUNDARY==--
    

This script installs the aws cli and ebs-autoscale on the instance. The ebs-autoscale daemon will monitor the /var/lib/docker mount point and dynamically increase the available storage based on predefined capacity thresholds. This installs the aws-cli to /miniconda/bin/aws. (see instructions on how to incorporate this into your pipeline here)

4. Batch Queue

Mantle runs nexflow pipelines on AWS batch. You will need to create a queue for Mantle to use.

Compute Environment

  1. Navigate to the AWS Batch Console.
  2. Click “Create compute environment”.
  3. Select “Amazon Elastic Compute Cloud (EC2)” for the compute environment type.
  4. Give the environment a name and select “Managed” for the environment type.
  5. Select an “Instance role” (the role you created earlier).
  6. Click “Next”.
  7. Provide a min, desired, and max vCPUs for the environment. a. We recommend starting with a min of 0 to save costs when the environment is not in use. b. Desired vCPUs can be set to 0 since the environment will be managed by AWS Batch. b. Set the max vCPUs to the maximum number of vCPUs you want to use at any given time. (If this is higher than your accounts limits it may run into errors when trying to scale up)
  8. Select the allowed instance types (we recommend using the optimal option).
  9. Under Instance Configuration -> Additional Configuration, select the launch template you created.
  10. Click “Next”, review your VPC and security group settings, and click “Next”.
  11. Review your settings and click “Create compute environment”.

Job Queue

  1. Navigate to the AWS Batch Console.
  2. Click “Job queues” in the left-hand menu.
  3. Click “Create”.
  4. Select “Amazon Elastic Compute Cloud (EC2)” for the orchestration type.
  5. Give the queue a name and select the compute environment you just created.
  6. Set the priority and click “Create job queue”.