Introduction

A Mantle dataset allows you to link raw data files, values, and/or metadata together.

Datasets can be created, updated, accessed, and queried for via the Mantle SDK.

Creating a dataset

A dataset must have at minimum a name.

Create a new dataset
import mantlebio

mantle = mantlebio.MantleClient()

dataset = mantle.dataset.create(
    name="example_minimal_dataset",
    local=False
)
The local keyword in Mantle relates to whether you’re asking the system to create a new dataset in the database via an API request. In this version of the SDK, local is set to True by default to be consistent with earlier versions. However, for most purposes, setting it to False is appropriate. When set to False, the dataset is automatically pushed to Mantle.

Creating a dataset with properties

Properties can be added to a dataset upon creation using a dictionary.

The valid types for properties are:

  • string
  • integer
  • double (float)
  • boolean
  • file (S3 files)
Create a new dataset with properties

import mantlebio

mantle = mantlebio.MantleClient()

dataset = mantle.dataset.create(
    name="4XP1",
    local=False,
    properties={
        "description": "X-ray structure of Drosophila dopamine transporter bound to neurotransmitter dopamine",
        "resolution": 2.5,
        "r_value": 0.2,
	   "pdb": {"file_upload": {"filename": "4xp1.pdb"}}
    }
)

To test this out yourself, download the PDB file here.

Creating a dataset with a specified data type

The data type of a dataset can be specified on creation. In this case, the data type must already exist and the dataset must have all the required properties of the data type on creation.

Create a new dataset and specify the data type
import mantlebio

mantle = mantlebio.MantleClient()

dataset = mantle.dataset.create(
    name="palmer_penguins_create_dataset_example",
	dataset_type="mantle_tabular_penguins", # Set the data type
    local=False,
    properties={
	   "penguin_data_csv": {"file_upload": {"filename": "palmer_penguins.csv"}} # Add the required file property
    }
)

To test this out yourself, download the CSV file here.

Interacting with a single dataset

Getting dataset by unique ID

Get a dataset by its unique ID
import mantlebio

mantle = mantlebio.MantleClient()

dataset = mantle.dataset.get("E000001")

Getting dataset properties

Get dataset properties
import mantlebio

mantle = mantlebio.MantleClient()

dataset = mantle.dataset.get("E000001")

dataset_properties = dataset.properties

Download S3 file properties

Download files from S3 file properties using the download_s3 method, which takes as arguments the key of the property and the local path to which the file will be downloaded.

Download files from properties
import mantlebio

mantle = mantlebio.MantleClient()

dataset = mantle.dataset.get("E000001")

dataset.download_s3("penguin_data_csv", "local_penguins.csv")

Updating a dataset with additional properties

To add a file to an existing dataset, you can use the upload_s3 method, which takes as arguments the key of the property and the path to the file to be uploaded. This method uploads your file into AWS S3 and attaches the S3 path as a file property on the dataset.

Non-file properties, such as strings and Booleans, can be added using the set_property method, which takes as arguments the key of the property and the value to be set.

Create a new dataset and update it with properties
import mantlebio

mantle = mantlebio.MantleClient()

# Get dataset to update. In this example we create it from scratch.
dataset = mantle.dataset.create(
    name="palmer_penguins_add_properties_example",
    local=False
)

# Upload a file to the dataset
dataset.upload_s3("mantle_example_file", "palmer_penguins.csv")

# Set a string property of the dataset
dataset.set_property("mantle_example_continent", "Antartica")

Querying for datasets and returning a DataFrame

To get a Pandas DataFrame where datasets are represented as rows, you can query for a group of datasets and turn them into a DataFrame.

Querying by data type

Query for datasets by data type
import mantlebio

mantle = mantlebio.MantleClient()

dataset_list = mantle.dataset.build_query().where(
    "data_type_unique_id=mantle_penguin_records"
).execute()

Querying by property

Query for datasets by property
import mantlebio

mantle = mantlebio.MantleClient()

dataset_list = mantle.dataset.build_query().where(
    "props.{species}.string.eq=Adelie"
).execute()

Creating a Pandas DataFrame of datasets

Convert list of datasets to Pandas DataFrame
import mantlebio

mantle = mantlebio.MantleClient()

dataset_list = mantle.dataset.build_query().where(
    "data_type_unique_id=mantle_penguin_records"
).execute()

df = dataset_list.to_dataframe()