The correspondence problem for metabonomics datasets

K Magnus Aberg; Erik Alm; Ralf J O Torgrip

doi:10.1007/s00216-009-2628-9

The correspondence problem for metabonomics datasets

Anal Bioanal Chem. 2009 May;394(1):151-62. doi: 10.1007/s00216-009-2628-9. Epub 2009 Feb 7.

Authors

K Magnus Aberg¹, Erik Alm, Ralf J O Torgrip

Affiliation

¹ Department of Analytical Chemistry, BioSysteMetrics Group, Stockholm University, 10691, Stockholm, Sweden. magnus.aberg@anchem.su.se

PMID: 19198812
DOI: 10.1007/s00216-009-2628-9

Abstract

In metabonomics it is difficult to tell which peak is which in datasets with many samples. This is known as the correspondence problem. Data from different samples are not synchronised, i.e., the peak from one metabolite does not appear in exactly the same place in all samples. For datasets with many samples, this problem is nontrivial, because each sample contains hundreds to thousands of peaks that shift and are identified ambiguously. Statistical analysis of the data assumes that peaks from one metabolite are found in one column of a data table. For every error in the data table, the statistical analysis loses power and the risk of missing a biomarker increases. It is therefore important to solve the correspondence problem by synchronising samples and there is no method that solves it once and for all. In this review, we analyse the correspondence problem, discuss current state-of-the-art methods for synchronising samples, and predict the properties of future methods.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Databases, Factual*
Metabolomics / methods*