Metabolomics research, like other disciplines utilizing high-throughput technologies, generates a large amount of data for every sample. Although handling this data is a challenge and one of the biggest bottlenecks of the metabolomics workflow, it is also the clue to accomplish valuable results. This work has been designed to supply methodological data mining guidelines, describing systematically the steps to be followed in metabolomics data exploration. Instrumental raw data refinement in the pre-processing step and assessment of the statistical assumptions in pre-treatment directly affect the results of subsequent univariate and multivariate analyses. A study of aging in a healthy population was selected to represent this data mining process. Multivariate analysis of variance and linear regression methods were used to analyze the metabolic changes underlying aging. Selection of both multivariate methods aims to illustrate the treatment of age from two rather different perspectives, as a categorical variable and a continuous variable.
Biological significance: Metabolomics is a discipline involving the analysis of a large amount of data to gather relevant information. Researchers in this field have to overcome the challenges of complex data processing and statistical analysis issues. A wide range of tasks has to be executed, from the minimization of batch-to-batch/systematic variations in pre-processing, to the application of common data analysis techniques relying on statistical assumptions. In this work, a real-data metabolic profiling research on aging was used to illustrate the proposed workflow and suggest a set of guidelines for analyzing metabolomics data. This article is part of a Special Issue entitled: HUPO 2014.
Keywords: Aging; Inter-batch normalization; Linear regression; MANOVA; Metabolomics; Statistical assumptions.
Copyright © 2015 Elsevier B.V. All rights reserved.