Modeling and analysis of multi-library, multi-group SAGE data with application to a study of mouse cerebellum

Biometrics. 2007 Sep;63(3):777-86. doi: 10.1111/j.1541-0420.2006.00733.x.

Abstract

A serial analysis of gene expression (SAGE) library is a collection of thousands of small DNA "tags," each of which represents a distinct messenger RNA (mRNA) transcript. Existing methods have been proposed for analyzing single library data (i.e., one library per group) or one tag at a time. The practice of lumping all libraries together (in a multi-library setting) to form a "mega" library for each group is obviously unsatisfactory, but nonetheless performed frequently due to the lack of alternative methods. Because the tag counts within each library are interrelated as they are drawn from a multinomial distribution, analyzing thousands of tags one at a time is undoubtedly inadequate. Not only does such a practice ignore the dependency, but it also faces the multiple testing adjustment issue. This article is an attempt to address both of these issues so that all tags from multi-library groups can be analyzed together. The methods proposed also gear toward multi-group data. Focusing on the problem of identifying genes that are differentially expressed, a Bayesian formulation is established. Under this formulation, the problem of separating the differentially expressed genes from the majority of similarly expressed ones is treated as a model selection problem, and the reversible jump Markov chain Monte Carlo method is adapted for this purpose. The method is applied to a set of mouse libraries to uncover genes that are associated with the process of aging in the cerebellum. Our gene ontology (GO) analysis of the genes selected classifies them into several GO categories, which appear to be functionally relevant to aging.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Aging / metabolism*
  • Animals
  • Bayes Theorem
  • Cerebellum / metabolism*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Databases, Protein*
  • Gene Expression Profiling / methods*
  • Information Storage and Retrieval / methods*
  • Mice
  • Models, Biological*
  • Models, Statistical
  • Nerve Tissue Proteins / metabolism*

Substances

  • Nerve Tissue Proteins