Automatic identification and classification of noun argument structures in biomedical literature

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1639-48. doi: 10.1109/TCBB.2012.111.

Abstract

The accelerating increase in the biomedical literature makes keeping up with recent advances challenging for researchers thus making automatic extraction and discovery of knowledge from this vast literature a necessity. Building such systems requires automatic detection of lexico-semantic event structures governed by the syntactic and semantic constraints of human languages in sentences of biomedical texts. The lexico-semantic event structures in sentences are centered around the predicates and most semantic role labeling (SRL) approaches focus only on the arguments of verb predicates and neglect argument taking nouns which also convey information in a sentence. In this article, a noun argument structure (NAS) annotated corpus named BioNom and a SRL system to identify and classify these structures is introduced. Also, a genetic algorithm-based feature selection (GAFS) method is introduced and global inference is applied to significantly improve the performance of the NAS Bio SRL system.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology / methods*
  • Data Mining / methods*
  • Databases, Factual*
  • Models, Genetic
  • Natural Language Processing*
  • Semantics
  • Software
  • Support Vector Machine