The Classification of Scientific Literature for Its Topical Tracking on a Small Human-Prepared Dataset

Stud Health Technol Inform. 2020 Jun 26:272:191-194. doi: 10.3233/SHTI200526.

Abstract

The number of scientific publications is constantly growing to make their processing extremely time-consuming. We hypothesized that a user-defined literature tracking may be augmented by machine learning on article summaries. A specific dataset of 671 article abstracts was obtained and nineteen binary classification options using machine learning (ML) techniques on various text representations were proposed in a pilot study. 300 tests with resamples were performed for each classification option. The best classification option demonstrated AUC = 0.78 proving the concept in general and indicating a potential for solution improvement.

Keywords: Text classification; artificial intelligence; machine learning; natural language processing; neurosurgery; topic modeling.

MeSH terms

  • Humans
  • Machine Learning*
  • Natural Language Processing
  • Pilot Projects