Comparison of Word and Character Level Information for Medical Term Identification Using Convolutional Neural Networks and Transformers

Sandaru Seneviratne; Artem Lenskiy; Christopher Nolan; Eleni Daskalaki; Hanna Suominen

doi:10.3233/SHTI210717

Comparison of Word and Character Level Information for Medical Term Identification Using Convolutional Neural Networks and Transformers

Stud Health Technol Inform. 2021 Dec 15:284:249-253. doi: 10.3233/SHTI210717.

Authors

Sandaru Seneviratne¹, Artem Lenskiy¹, Christopher Nolan², Eleni Daskalaki¹, Hanna Suominen^{1

3

4}

Affiliations

¹ School of Computing, The Australian National University (ANU), Australia.
² ANU Medical School and John Curtin School of Medical Research, ANU, Australia.
³ Data61, Commonwealth Scientific and Industrial Research Organisation, Australia.
⁴ Department of Computing, University of Turku, Finland.

PMID: 34920520
DOI: 10.3233/SHTI210717

Abstract

Complexity and domain-specificity make medical text hard to understand for patients and their next of kin. To simplify such text, this paper explored how word and character level information can be leveraged to identify medical terms when training data is limited. We created a dataset of medical and general terms using the Human Disease Ontology from BioPortal and Wikipedia pages. Our results from 10-fold cross validation indicated that convolutional neural networks (CNNs) and transformers perform competitively. The best F score of 93.9% was achieved by a CNN trained on both word and character level embeddings. Statistical significance tests demonstrated that general word embeddings provide rich word representations for medical term identification. Consequently, focusing on words is favorable for medical term identification if using deep learning architectures.

Keywords: Terminology; deep learning; text simplification; word embedding.

MeSH terms

Humans
Neural Networks, Computer*
Research Design*