Datasets
Introduction
A Mantle dataset allows you to link raw data files, values, and/or metadata together.
Datasets can be created, updated, accessed, and queried for via the Mantle SDK.
Creating a dataset
A dataset must have at minimum a name.
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.create(
name="example_minimal_dataset",
local=False
)
local
keyword in Mantle relates to whether you’re asking the system
to create a new dataset in the database via an API request.
In this version of the SDK, local
is set to True
by default to be consistent with earlier versions.
However, for most purposes, setting it to False
is appropriate.
When set to False
, the dataset is automatically pushed to Mantle. Creating a dataset with properties
Properties can be added to a dataset upon creation using a dictionary.
The valid types for properties are:
- string
- integer
- double (float)
- boolean
- file (S3 files)
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.create(
name="4XP1",
local=False,
properties={
"description": "X-ray structure of Drosophila dopamine transporter bound to neurotransmitter dopamine",
"resolution": 2.5,
"r_value": 0.2,
"pdb": {"file_upload": {"filename": "4xp1.pdb"}}
}
)
To test this out yourself, download the PDB file here.
Creating a dataset with a specified data type
The data type of a dataset can be specified on creation. In this case, the data type must already exist and the dataset must have all the required properties of the data type on creation.
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.create(
name="palmer_penguins_create_dataset_example",
dataset_type="mantle_tabular_penguins", # Set the data type
local=False,
properties={
"penguin_data_csv": {"file_upload": {"filename": "palmer_penguins.csv"}} # Add the required file property
}
)
To test this out yourself, download the CSV file here.
Interacting with a single dataset
Getting dataset by unique ID
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.get("E000001")
Getting dataset properties
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.get("E000001")
dataset_properties = dataset.properties
Download S3 file properties
Download files from S3 file properties using the download_s3
method,
which takes as arguments the key of the property and the local path to
which the file will be downloaded.
import mantlebio
mantle = mantlebio.MantleClient()
dataset = mantle.dataset.get("E000001")
dataset.download_s3("penguin_data_csv", "local_penguins.csv")
Updating a dataset with additional properties
To add a file to an existing dataset, you can use the upload_s3
method,
which takes as arguments the key of the property and the path to the file to be uploaded.
This method uploads your file into AWS S3 and attaches the S3 path as a file property on the dataset.
Non-file properties, such as strings and Booleans,
can be added using the set_property
method, which takes as arguments the key of the property and the value to be set.
import mantlebio
mantle = mantlebio.MantleClient()
# Get dataset to update. In this example we create it from scratch.
dataset = mantle.dataset.create(
name="palmer_penguins_add_properties_example",
local=False
)
# Upload a file to the dataset
dataset.upload_s3("mantle_example_file", "palmer_penguins.csv")
# Set a string property of the dataset
dataset.set_property("mantle_example_continent", "Antartica")
Querying for datasets and returning a DataFrame
To get a Pandas DataFrame where datasets are represented as rows, you can query for a group of datasets and turn them into a DataFrame.
Querying by data type
import mantlebio
mantle = mantlebio.MantleClient()
dataset_list = mantle.dataset.build_query().where(
"data_type_unique_id=mantle_penguin_records"
).execute()
Querying by property
import mantlebio
mantle = mantlebio.MantleClient()
dataset_list = mantle.dataset.build_query().where(
"props.{species}.string.eq=Adelie"
).execute()
Creating a Pandas DataFrame of datasets
import mantlebio
mantle = mantlebio.MantleClient()
dataset_list = mantle.dataset.build_query().where(
"data_type_unique_id=mantle_penguin_records"
).execute()
df = dataset_list.to_dataframe()