Nuclear insertions of mitochondrial origin (NUMTs) can be useful tools in evolution and population studies. However, due to their similarity to mitochondrial DNA (mtDNA), NUMTs may also be a source of contamination in mtDNA studies. The main goal of this work is to present a database of NUMTs, based on the latest version of the human genome-GRCh37 draft. A total of 755 insertions were identified. There are 33 paralogous sequences with over 80% sequence similarity and of a greater length than 500bp. The non-identical positions between paralogous sequences are listed for the first time. As an application example, the described database is used to evaluate the impact of NUMT contamination in cancer studies. The evaluation reveals that 220 positions from 256 with zero hits in the current mtDNA phylogeny could in fact be traced to one or more nuclear insertions of mtDNA. This is due to they are located in non-identical positions between mtDNA and nuclear DNA (nDNA). After in silico primer validation of each revised cancer study, risk of co-amplification between mtDNA and nDNA was detected in some cases, whereas in others no risk of amplification was identified. This approach to cancer studies clearly proves the potential of our NUMT database as a valuable new tool to validate mtDNA mutations described in different contexts. Moreover, due to the amount of information provided for each nuclear insertion, this database should play an important role in designing evolutionary, phylogenetic and epidemiological studies.
Copyright © 2011 Elsevier B.V. and Mitochondria Research Society. All rights reserved. All rights reserved.