Comparison of different normalization assumptions for analyses of DNA methylation data from the cancer genome

Gene. 2012 Sep 10;506(1):36-42. doi: 10.1016/j.gene.2012.06.075. Epub 2012 Jul 4.

Abstract

Nowadays, some researchers normalized DNA methylation arrays data in order to remove the technical artifacts introduced by experimental differences in sample preparation, array processing and other factors. However, other researchers analyzed DNA methylation arrays without performing data normalization considering that current normalizations for methylation data may distort real differences between normal and cancer samples because cancer genomes may be extensively subject to hypomethylation and the total amount of CpG methylation might differ substantially among samples. In this study, using eight datasets by Infinium HumanMethylation27 assay, we systemically analyzed the global distribution of DNA methylation changes in cancer compared to normal control and its effect on data normalization for selecting differentially methylated (DM) genes. We showed more differentially methylated (DM) genes could be found in the Quantile/Lowess-normalized data than in the non-normalized data. We found the DM genes additionally selected in the Quantile/Lowess-normalized data showed significantly consistent methylation states in another independent dataset for the same cancer, indicating these extra DM genes were effective biological signals related to the disease. These results suggested normalization can increase the power of detecting DM genes in the context of diagnostic markers which were usually characterized by relatively large effect sizes. Besides, we evaluated the reproducibility of DM discoveries for a particular cancer type, and we found most of the DM genes additionally detected in one dataset showed the same methylation directions in the other dataset for the same cancer type, indicating that these DM genes were effective biological signals in the other dataset. Furthermore, we showed that some DM genes detected from different studies for a particular cancer type were significantly reproducible at the functional level.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Colorectal Neoplasms / genetics
  • DNA Methylation*
  • Data Interpretation, Statistical
  • Databases, Nucleic Acid
  • Genome, Human
  • Humans
  • Kidney Neoplasms / genetics
  • Lung Neoplasms / genetics
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Reproducibility of Results
  • Stomach Neoplasms / genetics