Skip to content
English
  • There are no suggestions because the search field is empty.

How can I use the MD Format to upload my metabolomics data?

MD Format — Metabolite is the Mass Dynamics format for uploading metabolite-level quantitative data.

This format is intended for metabolomics datasets where each metabolite or feature has a quantitative intensity value for each sample. It is typically used for discovery metabolomics and relative quantification workflows

File structure

MD Format — Metabolite uses a long-format table.

This means each row represents one metabolite, or feature, measured in one sample.

A wide matrix, where metabolites are rows and samples are columns, must be converted into long format before upload.

Required files

A metabolite upload requires:

File Required Purpose
Metabolite intensity file Yes Contains metabolite or feature intensities for each sample
experiment_design.csv Yes Maps samples to the experimental design
sample_metadata.csv Yes Contains sample-level annotations

The SampleName values in the metabolite intensity file must match the sample_name values in experiment_design.csv and sample_metadata.csv exactly.

Sample names are case-sensitive.

Required columns in the metabolite intensity file

Your metabolite intensity file must contain the following columns:

Column Type Description
MetaboliteId Text Stable identifier for the metabolite or feature
MetaboliteIntensity Numeric Non-logged quantitative intensity value
SampleName Text Sample identifier
Imputed Integer 0 for measured values, 1 for missing or imputed values

Column details

MetaboliteId

MetaboliteId is the identifier used to group rows belonging to the same metabolite or feature.

You can use another stable identifier such as:

Identifier type Example
InChIKey Structure-based identifier
ChEBI Metabolite identifier
HMDB Metabolomics database identifier
KEGG Compound Compound and pathway identifier
PubChem CID Cross-reference identifier
Internal feature ID Stable identifier from your workflow

The identifier does not need to be the common metabolite name. In many cases, it is better to use a stable database ID or feature ID as MetaboliteId, then add a separate MetaboliteName column for readability.

The most important rule is consistency. If you want to compare or map metabolites across datasets, use the same identifier system across those datasets wherever possible.

MetaboliteIntensity

MetaboliteIntensity is the quantitative value for a metabolite or feature in a sample.

Upload non-logged intensity values.

You may upload raw, normalised, or analysis-ready intensity values, depending on your workflow. Do not upload log-transformed values. Normalisation and imputation can also be performed in Mass Dynamics after upload.

SampleName

SampleName identifies the sample associated with each intensity value.

Each SampleName must match the corresponding sample_name in:

Companion file Matching column
experiment_design.csv sample_name
sample_metadata.csv sample_name

The values must match exactly, including capitalisation, spacing, and punctuation.

Imputed

Imputed tells Mass Dynamics whether the value is a measured value or a missing/imputed value.

Value Meaning
0 Measured value
1 Missing or imputed value

The Imputed column must only contain 0 or 1.

Missing values

Missing measurements should be included in the file as:

Field Value
MetaboliteIntensity 0.0
Imputed 1

Do not leave missing measurements blank.

Do not omit rows for missing measurements.

Do not use Imputed = 0 for a zero value unless you are certain the zero represents a real measured value.

In most metabolomics workflows, a value of 0.0 should be treated as missing and marked with Imputed = 1.

Full matrix requirement

The metabolite intensity file must contain a complete matrix.

This means every MetaboliteId must have exactly one row for every SampleName.

For example, if you have:

Number of metabolites or features Number of samples Required rows
3 4 12
100 20 2,000
1,000 50 50,000

A missing measurement is still included as a row. It should be represented as MetaboliteIntensity = 0.0 and Imputed = 1.

Visual example of the long format

If your wide matrix looks like this:

MetaboliteId MetaboliteName Sample_A Sample_B Sample_C
HMDB0000122 Glucose 15320.5 0.0 14220.7
HMDB0000094 Citrate 8421.3 9105.9 8770.2

It should be converted into long format:

MetaboliteId MetaboliteName SampleName MetaboliteIntensity Imputed
HMDB0000122 Glucose Sample_A 15320.5 0
HMDB0000122 Glucose Sample_B 0.0 1
HMDB0000122 Glucose Sample_C 14220.7 0
HMDB0000094 Citrate Sample_A 8421.3 0
HMDB0000094 Citrate Sample_B 9105.9 0
HMDB0000094 Citrate Sample_C 8770.2 0

In this example, the Glucose measurement for Sample_B is missing, so it is uploaded as 0.0 with Imputed = 1.

Extra metabolite metadata columns

You can include extra metabolite annotation columns in the intensity file.

Examples include:

Column Purpose
MetaboliteName Human-readable metabolite or feature name
Description Short description
HMDB HMDB identifier
KEGG KEGG Compound identifier
ChEBI ChEBI identifier
PubChemCID PubChem Compound ID
mz Mass-to-charge value
RetentionTime Retention time
Formula Molecular formula
SMILES Structure string
Pathway Pathway annotation
AnnotationStatus Identification or annotation confidence

Extra columns are imported as metabolite-level metadata.

The value in each metadata column must stay the same for every row with the same MetaboliteId.

For example, if MetaboliteId = HMDB0000122 has MetaboliteName = Glucose, then every row for HMDB0000122 must also have MetaboliteName = Glucose.

Only MetaboliteIntensity and Imputed should vary between samples for the same MetaboliteId.

Example experiment_design.csv

sample_name,condition
Sample_A,Control
Sample_B,Control
Sample_C,Treatment

Example sample_metadata.csv

sample_name,condition,batch,timepoint
Sample_A,Control,1,Baseline
Sample_B,Control,1,Baseline
Sample_C,Treatment,2,Week_4

You can include additional sample-level annotations in sample_metadata.csv, such as batch, dose, timepoint, sex, tissue, treatment, or disease state.

Anything that varies by sample belongs in sample_metadata.csv, not in the metabolite intensity file.

Converting a wide matrix into MD Format — Metabolite

If your data is currently a wide matrix, you need to convert it into long format before upload.

Example Python conversion:

________________________________________________________________________________________________

import pandas as pd

wide = pd.read_csv("my_metabolomics_wide.tsv", sep="\t")

id_col = "MetaboliteId"
meta_cols = ["MetaboliteName", "Description"]

sample_cols = [
    col for col in wide.columns
    if col not in [id_col, *meta_cols]
]

long_df = wide.melt(
    id_vars=[id_col, *meta_cols],
    value_vars=sample_cols,
    var_name="SampleName",
    value_name="MetaboliteIntensity",
)

long_df["MetaboliteIntensity"] = long_df["MetaboliteIntensity"].fillna(0.0)
long_df["Imputed"] = (long_df["MetaboliteIntensity"] == 0).astype(int)

long_df.to_csv(
    "md_format_metabolite.tsv",
    sep="\t",
    index=False
)

________________________________________________________________________________________________

This produces one row per metabolite or feature per sample.

Upload checklist

Before uploading, check that:

  • The file contains MetaboliteId, MetaboliteIntensity, SampleName, and Imputed
  • MetaboliteIntensity values are non-logged
  • Imputed only contains 0 or 1
  • Missing values are represented as 0.0 with Imputed = 1
  • Every metabolite or feature appears once for every sample
  • Extra metabolite metadata columns are consistent within each MetaboliteId
  • SampleName matches sample_name in experiment_design.csv and sample_metadata.csv
  • experiment_design.csv is included
  • sample_metadata.csv is included

Known limitations

Pathway enrichment and entity mapping depend on the identifiers provided in your upload.

Mass Dynamics links entities using the IDs and metadata supplied with the data. For best results across multiple uploads, use stable and consistent identifiers across datasets.

Steps to upload your metabolomics data using the MD Format

  1. Go to the Dataset Upload page
  2. Choose "Choose Files" or "Choose Folder" depending on how your files are stored

  3. There is 1 file that is needed. Do not be concerned with the file name, as we will detect the data in the files based on the column names. Please ensure that your file contains your data in the following structure and that the files are uploaded with the "tsv" or “csv” file extensions.

     

  4. After choosing the relevant files to upload for your file system, click "Next"


  5. Mass Dynamics should now detect that it is the MD Format that you are using, and will detect the number of files found.

  6. You can now upload your own populated sample metadata csv file or simply click next and upload sample metadata later on. 

  7. Go ahead and enter your Experiment Name and Experiment Description




  8. Click "Create Experiment" and wait until your files upload
  9. Get started with analyzing your data!