Comparison of pre-processing methodologies for Illumina 450k methylation array data in familial analyses

Clin Epigenetics. 2016 Jul 16:8:75. doi: 10.1186/s13148-016-0241-2. eCollection 2016.

Abstract

Background: Human methylome mapping in health and disease states has largely relied on Illumina Human Methylation 450k array (450k array) technology. Accompanying this has been the necessary evolution of analysis pipelines to facilitate data processing. The majority of these pipelines, however, cater for experimental designs where matched 'controls' or 'normal' samples are available. Experimental designs where no appropriate 'reference' exists remain challenging. Herein, we use data generated from our study of the inheritance of methylome profiles in families to evaluate the performance of eight normalisation pre-processing methods. Fifty individual samples representing four families were interrogated on five 450k array BeadChips. Eight normalisation methods were tested using qualitative and quantitative metrics, to assess efficacy and suitability.

Results: Stratified quantile normalisation combined with ComBat were consistently found to be the most appropriate when assessed using density, MDS and cluster plots. This was supported quantitatively by ANOVA on the first principal component where the effect of batch dropped from p < 0.01 to p = 0.97 after stratified QN and ComBat. Median absolute differences between replicated samples were the lowest after stratified QN and ComBat as were the standard error measures on known imprinted regions. Biological information was preserved after normalisation as indicated by the maintenance of a significant association between a known mQTL and methylation (p = 1.05e-05).

Conclusions: A strategy combining stratified QN with ComBat is appropriate for use in the analyses when no reference sample is available but preservation of biological variation is paramount. There is great potential for use of 450k array data to further our understanding of the methylome in a variety of similar settings. Such advances will be reliant on the determination of appropriate methodologies for processing these data such as established here.

Keywords: 450k; Array; Familial data; Methylation; Normalisation; Pre-processing pipeline.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • CpG Islands
  • DNA Methylation*
  • Databases, Genetic
  • Female
  • Genome, Human*
  • Heredity
  • Humans
  • Male
  • Middle Aged
  • Oligonucleotide Array Sequence Analysis / methods
  • Oligonucleotide Array Sequence Analysis / standards*
  • Quantitative Trait Loci
  • Sequence Analysis, DNA / methods
  • Sequence Analysis, DNA / standards*
  • Software
  • Young Adult