Mantle Penguins Data Demonstration
Introduction
Welcome to Mantle!
When you first log into your account, you will notice that it contains some example data, pipelines, and analysis notebooks about penguin measurements. The data originate from the Palmer penguins dataset:
Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218.
In this guide, we’ll go over each of the components that are included. For more information on Mantle’s features, make sure to check out the rest of the docs!
Data lake
The datasets section of Mantle is the home of your data lake.
Two penguin data types are available.
The mantle_tabular_penguins
data type has a single property: penguin_data_csv
. When you create a dataset with this type, you upload a file to that property. The three example datasets each have a CSV file containing penguin measurement data from a single year.
Clicking on the name of the data type on the left allows you to see all the datasets of that type.
In this view, you can view the name of the files stored in the penguin_data_csv
property for each dataset.
Clicking into a single dataset via the unique ID, you can see a graph of the data lineage, a preview of the CSV files stored in the penguin_data_csv
property, and the history of the dataset.
The graph shows that these datasets were used as inputs to the mantle-split-penguin-records
pipeline, which we’ll discuss below.
The mantle_penguin_records
data type has 8 properties: species
, island
, bill_length_mm
, bill_depth_mm
, flipper_length_mm
, body_mass_g
, sex
, and year
.
These are the same measurements and metadata that are contained in the CSVs attached to the mantle_tabular_penguins
datasets, but the data for each year are combined into a single table. Each row is a single Mantle dataset.
How were the CSVs converted into these datasets? By using the mantle-split-penguin-records
pipeline.
Pipelines
Pipelines are one of the ways you can transform and process data in Mantle.
The mantle-split-penguin-records
pipeline was used to transform data stored as rows in CSVs into individual Mantle datasets. The pipeline’s page includes details about the current version, runs of each version of the pipeline, the computational queue, the GitHub repository containing the source code for the pipeline, and the input configuration for the pipeline.
Analyses
Analysis notebooks, powered by Jupyter, are another way to transform and process data in Mantle.
We explored the Palmer penguin dataset in the penguins_classification
analysis, which we launched in the penguins-analysis
analysis environment.
Inside the notebook, we loaded the penguin data by querying for mantle_penguin_records
datasets using the Mantle SDK. After making an exploratory pairplot, we evaluated the performance of a gradient-boosting classifier on predicting a penguin’s species from its measurements and sex.
Wrapping up
Thanks for following along through our quick demonstration of Mantle through the Palmer penguins dataset. We hope you enjoy using Mantle to organize, transform, and draw insight from your data!