Introduction

Spatial transcriptomics is the visualization and quantification of gene expression in tissue sections, maintaining the spatial context of the tissue architecture. One technology for spatial transcriptomics is 10x Genomics Visium. It uses a slide with an array of spots, each containing barcoded oligonucleotides, to capture mRNA from tissue sections placed on the slide. The mRNA then undergoes high-throughput RNA sequencing. The raw results from Visium experiments include a microscopy image of the tissue slice on the Visium slide and sequencing reads in FASTQ format. In order to use this data to reveal insights into cellular microenvironments, cellular heterogeneity, and spatial relationships within tissue, the data require processing, including the typical steps taken for processing RNA sequencing reads, as well as tissue detection and determination of which barcode spots overlap with the tissue.

To import an example Visium spatial transcriptomics workflow, including data types, example datasets, pipeline and runs, analysis environments, and analysis notebooks, click on the Import a template button on your Mantle dashboard and select Spatial-Transcriptomics.

Raw data

Both images and FASTQ data cannot be easily stored in typical databases. An individual FASTQ file is semi-structured, and images are unstructured. Storing these files within a database as binary large objects (BLOBs) can be challenging due to their large size (typically tens of gigabytes per FASTQ file).

Within the Mantle data lake, you can store data files and associated metadata together as datasets. In the our spatial transcriptomics demonstration workflow, experimental data files are stored as datasets of the spatial-experiment data type, which has the following properties:

fastq_directory
file
required

Directory containing RNA sequencing FASTQ files

image
file
required

Microscopy image file

image_type
string
required

image, darkimage, colorizedimage, or cytaimage.

Used as an input to Spaceranger Count as --<image_type> <image>.

See the Spaceranger Count documentation for more information.

slide_id
string
required

The Visium slide serial number. Corresponds to the slide input to Spaceranger Count. See the Spaceranger Count documentation for more information.

slidefile
file

Optional. The slide design file for your slide. See the Spaceranger Count documentation for more information.

manual_alignment
file

Optional. The slide design file for your slide. Corresponds to the loupe-alignment input to Spaceranger Count. See the Spaceranger Count documentation for more information.

capture area
string
required

Visium capture area identifier. Corresponds to the area input to Spaceranger Count. See the Spaceranger Count documentation for more information.

data_source_publication
string

Optional. The publication that the data originated from.

fastq_sra_accession_number
string

Optional. The SRA Accession Number for the FASTQ files.

In this example workflow, we’ve included a subset of the data from this manuscript:

Sudmeier, et al. (2022) “Distinct phenotypic states and spatial distribution of CD8+ T cell clonotypes in human brain metastases.” Cell Reports Medicine, 3(5), 100620

Additional properties can be added to datasets besides those specified in the data type. We added a string property describing the primary tumor for each tissue sample, since these are tumor metastasis samples.

Data processing

In our demonstration workflow, spatial transcriptomics experiment data is processed using a Nextflow pipeline that runs 10x Spaceranger Count. Spaceranger Count is a command line tool. Its inputs include FASTQ files, a microscope image, slide ID and capture area, and reference transcriptome. It is computationally intensive to run, requiring at least an 8-core processor (32 cores recommended), 64 GB RAM (128 GB recommended), and 1 TB free disk space.

Our spaceranger-count pipeline takes as input a dataset of the spatial-experiment type and a dataset of the spatial-reference type. The spatial-experiment dataset contains most of the Spaceranger Count required inputs as properties. The reference transcriptome is stored in the spatial-reference dataset.

The spatial-reference data type has the following properties:

reference_transcriptome_directory
file
required

The directory containing the pre-compiled reference transcriptome files.

genome_name
string
required
source_url
string
required

The download URL this directory was sourced from.

Our example workflow uses the 10x Genomics pre-built human reference. We have also provided the 10x Genomics pre-built mouse reference.

The Mantle spaceranger-count pipeline outputs all the files generated by Spaceranger Count, and a spaceranger-outputs dataset. The spaceranger-outputs data type has path_to_output_dir as a property, which contains the path to the directory containing the Spaceranger Count outputs.

The output dataset is used as an input to a downstream analysis notebook, and serves to link the Spaceranger Count outputs to the input datasets.

Analyzing processed data

We analyzed the Spaceranger Count outputs in the pt16_spatial_gene_expression notebook, which ran in the spatial-transcriptomics-analysis analysis environment.

Within the notebook, we used the Mantle SDK to stage the Spaceranger Count outputs directory and mark the output dataset from the pipeline as an input to the analysis notebook. We plotted the graph-based clustering calculated by Spaceranger Count. Additionally, we used the Scanpy package to preprocess the transcriptomics data and plot the spatial distribution of several genes of interest.

Finally, we used the Mantle SDK to register the plots as outputs of the notebook.

Wrapping up

Spatial transcriptomics data is complex and requires extensive processing and analysis to derive insights from it. Storing image and FASTQ files in a dataset with experimental metadata enables you to keep your data organized. Running Spaceranger Count on Mantle allows you to not worry about provisioning its high computational resource needs or to remember all the command line arguments. Additional analysis, which may change and depends on your experimental goals, can be accomplished in flexible notebooks. Try applying this workflow to your own Visium spatial transcriptomics data!