Enhancing metabolomics research through data mining

Ibon Martínez-Arranz; Rebeca Mayo; Miriam Pérez-Cormenzana; Itziar Mincholé; Lorena Salazar; Cristina Alonso; José M Mato

doi:10.1016/j.jprot.2015.01.019

Enhancing metabolomics research through data mining

J Proteomics. 2015 Sep 8;127(Pt B):275-88. doi: 10.1016/j.jprot.2015.01.019. Epub 2015 Feb 7.

Authors

Ibon Martínez-Arranz¹, Rebeca Mayo¹, Miriam Pérez-Cormenzana¹, Itziar Mincholé¹, Lorena Salazar², Cristina Alonso¹, José M Mato³

Affiliations

¹ OWL, Parque Tecnológico de Bizkaia, Derio, Bizkaia, Spain.
² Osarten kooperatiba elkartea, Mondragón, Guipúzcoa, Spain.
³ CIC bioGUNE, CIBERehd, Parque Tecnológico de Bizkaia, Derio, Bizkaia, Spain. Electronic address: director@cicbiogune.es.

PMID: 25668325
DOI: 10.1016/j.jprot.2015.01.019

Abstract

Metabolomics research, like other disciplines utilizing high-throughput technologies, generates a large amount of data for every sample. Although handling this data is a challenge and one of the biggest bottlenecks of the metabolomics workflow, it is also the clue to accomplish valuable results. This work has been designed to supply methodological data mining guidelines, describing systematically the steps to be followed in metabolomics data exploration. Instrumental raw data refinement in the pre-processing step and assessment of the statistical assumptions in pre-treatment directly affect the results of subsequent univariate and multivariate analyses. A study of aging in a healthy population was selected to represent this data mining process. Multivariate analysis of variance and linear regression methods were used to analyze the metabolic changes underlying aging. Selection of both multivariate methods aims to illustrate the treatment of age from two rather different perspectives, as a categorical variable and a continuous variable.

Biological significance: Metabolomics is a discipline involving the analysis of a large amount of data to gather relevant information. Researchers in this field have to overcome the challenges of complex data processing and statistical analysis issues. A wide range of tasks has to be executed, from the minimization of batch-to-batch/systematic variations in pre-processing, to the application of common data analysis techniques relying on statistical assumptions. In this work, a real-data metabolic profiling research on aging was used to illustrate the proposed workflow and suggest a set of guidelines for analyzing metabolomics data. This article is part of a Special Issue entitled: HUPO 2014.

Keywords: Aging; Inter-batch normalization; Linear regression; MANOVA; Metabolomics; Statistical assumptions.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Data Mining / methods*
Metabolomics / methods*