Global identification of human transcribed sequences with genome tiling arrays

Science. 2004 Dec 24;306(5705):2242-6. doi: 10.1126/science.1103388. Epub 2004 Nov 11.

Abstract

Elucidating the transcribed regions of the genome constitutes a fundamental aspect of human biology, yet this remains an outstanding problem. To comprehensively identify coding sequences, we constructed a series of high-density oligonucleotide tiling arrays representing sense and antisense strands of the entire nonrepetitive sequence of the human genome. Transcribed sequences were located across the genome via hybridization to complementary DNA samples, reverse-transcribed from polyadenylated RNA obtained from human liver tissue. In addition to identifying many known and predicted genes, we found 10,595 transcribed sequences not detected by other methods. A large fraction of these are located in intergenic regions distal from previously annotated genes and exhibit significant homology to other mammalian proteins.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Base Sequence
  • Computational Biology
  • Conserved Sequence
  • CpG Islands
  • DNA, Complementary
  • DNA, Intergenic
  • Databases, Genetic
  • Exons
  • Genome, Human*
  • Humans
  • Introns
  • Mice
  • Nucleic Acid Hybridization
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Probes
  • Proteins / chemistry
  • Proteins / genetics
  • RNA, Messenger / genetics
  • Reproducibility of Results
  • Reverse Transcriptase Polymerase Chain Reaction
  • Sequence Homology, Nucleic Acid
  • Transcription, Genetic*

Substances

  • DNA, Complementary
  • DNA, Intergenic
  • Oligonucleotide Probes
  • Proteins
  • RNA, Messenger