Methodological aspects of the genetic dissection of gene expression

Bioinformatics. 2005 May 15;21(10):2383-93. doi: 10.1093/bioinformatics/bti241.

Abstract

Motivation: Dissection of the genetics underlying gene expression utilizes techniques from microarray analyses as well as quantitative trait loci (QTL) mapping. Available QLT mapping methods are not tailored for the highly automated analyses required to deal with the thousand of gene transcripts encountered in the mapping of QTL affecting gene expression (sometimes referred to as eQTL). This report focuses on the adaptation of QTL mapping methodology to perform automated mapping of QTL affecting gene expression.

Results: The analyses of expression data on > 12,000 gene transcripts in BXD recombinant inbred mice found, on average, 629 QTL exceeding the genome-wide 5% threshold. Using additional information on trait repeatabilities and QTL location, 168 of these were classified as 'high confidence' QTL. Current sample sizes of genetical genomics studies make it possible to detect a reasonable number of QTL using simple genetic models, but considerably larger studies are needed to evaluate more complex genetic models. After extensive analyses of real data and additional simulated data (altogether > 300,000 genome scans) we make the following recommendations for detection of QTL for gene expression: (1) For populations with an unbalanced number of replicates on each genotype, weighted least squares should be preferred above ordinary least squares. Weights can be based on repeatability of the trait and the number of replicates. (2) A genome scan based on multiple marker information but analysing only at marker locations is a good approximation to a full interval mapping procedure. (3) Significance testing should be based on empirical genome-wide significance thresholds that are derived for each trait separately. (4) The significant QTL can be separated into high and low confidence QTL using a false discovery rate that incorporates prior information such as transcript repeatabilities and co-localization of gene-transcripts and QTL. (5) Including observations on the founder lines in the QTL analysis should be avoided as it inflates the test statistic and increases the Type I error. (6) To increase the computational efficiency of the study, use of parallel computing is advised. These recommendations are summarized in a possible strategy for mapping of QTL in a least squares framework.

Availability: The software used for this study is available on request from the authors.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Chromosome Mapping / methods*
  • Computer Simulation
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / genetics*
  • Mice
  • Mice, Inbred C57BL
  • Models, Genetic*
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods
  • Proteome / genetics
  • Quantitative Trait Loci / genetics*
  • Transcription Factors / genetics*

Substances

  • Proteome
  • Transcription Factors