QTLMiner: QTL database curation by mining tables in literature

Bioinformatics. 2015 May 15;31(10):1689-91. doi: 10.1093/bioinformatics/btv016. Epub 2015 Jan 12.

Abstract

Motivation: Figures and tables in biomedical literature record vast amounts of important experiment results. In scientific papers, for example, quantitative trait locus (QTL) information is usually presented in tables. However, most of the popular text-mining methods focus on extracting knowledge from unstructured free text. As far as we know, there are no published works on mining tables in biomedical literature. In this article, we propose a method to extract QTL information from tables and plain text found in literature. Heterogeneous and complex tables were converted into a structured database, combined with information extracted from plain text. Our method could greatly reduce labor burdens involved with database curation.

Results: We applied our method on a soybean QTL database curation, from which 2278 records were extracted from 228 papers with a precision rate of 96.9% and a recall rate of 83.3%, F value for the method is 89.6%.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Mining / methods*
  • Databases, Nucleic Acid*
  • Glycine max / genetics
  • Publications
  • Quantitative Trait Loci*
  • Software*