How can I use the MD Format to upload my metabolomics data?

MD Format — Metabolite is the Mass Dynamics format for uploading metabolite-level quantitative data.

This format is intended for metabolomics datasets where each metabolite or feature has a quantitative intensity value for each sample. It is typically used for discovery metabolomics and relative quantification workflows

File structure

MD Format — Metabolite uses a long-format table.

This means each row represents one metabolite, or feature, measured in one sample.

A wide matrix, where metabolites are rows and samples are columns, must be converted into long format before upload.

Required files

A metabolite upload requires:

File	Required	Purpose
Metabolite intensity file	Yes	Contains metabolite or feature intensities for each sample
`experiment_design.csv`	Yes	Maps samples to the experimental design
`sample_metadata.csv`	Yes	Contains sample-level annotations

The SampleName values in the metabolite intensity file must match the sample_name values in experiment_design.csv and sample_metadata.csv exactly.

Sample names are case-sensitive.

Required columns in the metabolite intensity file

Your metabolite intensity file must contain the following columns:

Column	Type	Description
`MetaboliteId`	Text	Stable identifier for the metabolite or feature
`MetaboliteIntensity`	Numeric	Non-logged quantitative intensity value
`SampleName`	Text	Sample identifier
`Imputed`	Integer	`0` for measured values, `1` for missing or imputed values

Column details

`MetaboliteId`

MetaboliteId is the identifier used to group rows belonging to the same metabolite or feature.

You can use another stable identifier such as:

Identifier type	Example
InChIKey	Structure-based identifier
ChEBI	Metabolite identifier
HMDB	Metabolomics database identifier
KEGG Compound	Compound and pathway identifier
PubChem CID	Cross-reference identifier
Internal feature ID	Stable identifier from your workflow

The identifier does not need to be the common metabolite name. In many cases, it is better to use a stable database ID or feature ID as MetaboliteId, then add a separate MetaboliteName column for readability.

The most important rule is consistency. If you want to compare or map metabolites across datasets, use the same identifier system across those datasets wherever possible.

`MetaboliteIntensity`

MetaboliteIntensity is the quantitative value for a metabolite or feature in a sample.

Upload non-logged intensity values.

You may upload raw, normalised, or analysis-ready intensity values, depending on your workflow. Do not upload log-transformed values. Normalisation and imputation can also be performed in Mass Dynamics after upload.

`SampleName`

SampleName identifies the sample associated with each intensity value.

Each SampleName must match the corresponding sample_name in:

Companion file	Matching column
`experiment_design.csv`	`sample_name`
`sample_metadata.csv`	`sample_name`

The values must match exactly, including capitalisation, spacing, and punctuation.

`Imputed`

Imputed tells Mass Dynamics whether the value is a measured value or a missing/imputed value.

Value	Meaning
`0`	Measured value
`1`	Missing or imputed value

The Imputed column must only contain 0 or 1.

Missing values

Missing measurements should be included in the file as:

Field	Value
`MetaboliteIntensity`	`0.0`
`Imputed`	`1`

Do not leave missing measurements blank.

Do not omit rows for missing measurements.

Do not use Imputed = 0 for a zero value unless you are certain the zero represents a real measured value.

In most metabolomics workflows, a value of 0.0 should be treated as missing and marked with Imputed = 1.

Full matrix requirement

The metabolite intensity file must contain a complete matrix.

This means every MetaboliteId must have exactly one row for every SampleName.

For example, if you have:

Number of metabolites or features	Number of samples	Required rows
3	4	12
100	20	2,000
1,000	50	50,000

A missing measurement is still included as a row. It should be represented as MetaboliteIntensity = 0.0 and Imputed = 1.

Visual example of the long format

If your wide matrix looks like this:

MetaboliteId	MetaboliteName	Sample_A	Sample_B	Sample_C
HMDB0000122	Glucose	15320.5	0.0	14220.7
HMDB0000094	Citrate	8421.3	9105.9	8770.2

It should be converted into long format:

MetaboliteId	MetaboliteName	SampleName	MetaboliteIntensity	Imputed
HMDB0000122	Glucose	Sample_A	15320.5	0
HMDB0000122	Glucose	Sample_B	0.0	1
HMDB0000122	Glucose	Sample_C	14220.7	0
HMDB0000094	Citrate	Sample_A	8421.3	0
HMDB0000094	Citrate	Sample_B	9105.9	0
HMDB0000094	Citrate	Sample_C	8770.2	0

In this example, the Glucose measurement for Sample_B is missing, so it is uploaded as 0.0 with Imputed = 1.

Extra metabolite metadata columns

You can include extra metabolite annotation columns in the intensity file.

Examples include:

Column	Purpose
`MetaboliteName`	Human-readable metabolite or feature name
`Description`	Short description
`HMDB`	HMDB identifier
`KEGG`	KEGG Compound identifier
`ChEBI`	ChEBI identifier
`PubChemCID`	PubChem Compound ID
`mz`	Mass-to-charge value
`RetentionTime`	Retention time
`Formula`	Molecular formula
`SMILES`	Structure string
`Pathway`	Pathway annotation
`AnnotationStatus`	Identification or annotation confidence

Extra columns are imported as metabolite-level metadata.

The value in each metadata column must stay the same for every row with the same MetaboliteId.

For example, if MetaboliteId = HMDB0000122 has MetaboliteName = Glucose, then every row for HMDB0000122 must also have MetaboliteName = Glucose.

Only MetaboliteIntensity and Imputed should vary between samples for the same MetaboliteId.

Example experiment_design.csv

sample_name,condition
Sample_A,Control
Sample_B,Control
Sample_C,Treatment

Example sample_metadata.csv

sample_name,condition,batch,timepoint
Sample_A,Control,1,Baseline
Sample_B,Control,1,Baseline
Sample_C,Treatment,2,Week_4

You can include additional sample-level annotations in sample_metadata.csv, such as batch, dose, timepoint, sex, tissue, treatment, or disease state.

Anything that varies by sample belongs in sample_metadata.csv, not in the metabolite intensity file.

Converting a wide matrix into MD Format — Metabolite

If your data is currently a wide matrix, you need to convert it into long format before upload.

Example Python conversion:

________________________________________________________________________________________________

import pandas as pd

wide = pd.read_csv("my_metabolomics_wide.tsv", sep="\t")

id_col = "MetaboliteId"
meta_cols = ["MetaboliteName", "Description"]

sample_cols = [
col for col in wide.columns
if col not in [id_col, *meta_cols]
]

long_df = wide.melt(
id_vars=[id_col, *meta_cols],
value_vars=sample_cols,
var_name="SampleName",
value_name="MetaboliteIntensity",
)

long_df["MetaboliteIntensity"] = long_df["MetaboliteIntensity"].fillna(0.0)
long_df["Imputed"] = (long_df["MetaboliteIntensity"] == 0).astype(int)

long_df.to_csv(
"md_format_metabolite.tsv",
sep="\t",
index=False
)

________________________________________________________________________________________________

This produces one row per metabolite or feature per sample.

Upload checklist

Before uploading, check that:

The file contains MetaboliteId, MetaboliteIntensity, SampleName, and Imputed
MetaboliteIntensity values are non-logged
Imputed only contains 0 or 1
Missing values are represented as 0.0 with Imputed = 1
Every metabolite or feature appears once for every sample
Extra metabolite metadata columns are consistent within each MetaboliteId
SampleName matches sample_name in experiment_design.csv and sample_metadata.csv
experiment_design.csv is included
sample_metadata.csv is included

Known limitations

Pathway enrichment and entity mapping depend on the identifiers provided in your upload.

Mass Dynamics links entities using the IDs and metadata supplied with the data. For best results across multiple uploads, use stable and consistent identifiers across datasets.

Steps to upload your metabolomics data using the MD Format

Go to the Dataset Upload page
Choose "Choose Files" or "Choose Folder" depending on how your files are stored
There is 1 file that is needed. Do not be concerned with the file name, as we will detect the data in the files based on the column names. Please ensure that your file contains your data in the following structure and that the files are uploaded with the "tsv" or “csv” file extensions.
After choosing the relevant files to upload for your file system, click "Next"
Mass Dynamics should now detect that it is the MD Format that you are using, and will detect the number of files found.
You can now upload your own populated sample metadata csv file or simply click next and upload sample metadata later on.
Go ahead and enter your Experiment Name and Experiment Description
Click "Create Experiment" and wait until your files upload
Get started with analyzing your data!