Inter-Rater Reliability of Unstructured Text Labeling: Artificially vs. Naturally Intelligent Approaches

Stud Health Technol Inform. 2021 May 27:281:118-122. doi: 10.3233/SHTI210132.

Abstract

Unstructured medical text labeling technologies are expected to be highly demanded since the interest in artificial intelligence and natural language processing arises in the medical domain. Our study aimed to assess the agreement between experts who judged on the fact of pulmonary embolism (PE) in neurosurgical cases retrospectively based on electronic health records and assess the utility of the machine learning approach to automate this process. We observed a moderate agreement between 3 independent raters on PE detection (Light's kappa = 0.568, p = 0). Labeling sentences with the method we proposed earlier might improve the machine learning results (accuracy = 0.97, ROC AUC = 0.98) even in those cases that could not be agreed between 3 independent raters. Medical text labeling techniques might be more efficient when strict rules and semi-automated approaches are implemented. Machine learning might be a good option for unstructured text labeling when the reliability of textual data is properly addressed. This project was supported by the RFBR grant 18-29-22085.

Keywords: Machine Learning; Natural Language Processing; Neurosurgery; Pulmonary Embolism.

MeSH terms

  • Artificial Intelligence*
  • Electronic Health Records
  • Machine Learning
  • Natural Language Processing*
  • Reproducibility of Results
  • Retrospective Studies