Vertebrate gene predictions and the problem of large genes

Nat Rev Genet. 2003 Sep;4(9):741-9. doi: 10.1038/nrg1160.

Abstract

To find unknown protein-coding genes, annotation pipelines use a combination of ab initio gene prediction and similarity to experimentally confirmed genes or proteins. Here, we show that although the ab initio predictions have an intrinsically high false-positive rate, they also have a consistently low false-negative rate. The incorporation of similarity information is meant to reduce the false-positive rate, but in doing so it increases the false-negative rate. The crucial variable is gene size (including introns)--genes of the most extreme sizes, especially very large genes, are most likely to be incorrectly predicted.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Exons
  • Gene Expression
  • Genetic Techniques / statistics & numerical data
  • Genome, Human
  • Humans
  • Introns
  • Models, Genetic*
  • Organ Specificity
  • Predictive Value of Tests
  • Vertebrates / genetics*