Exploring a method for extracting concerns of multiple breast cancer patients in the domain of patient narratives using BERT and its optimization by domain adaptation using masked language modeling

PLoS One. 2024 Sep 6;19(9):e0305496. doi: 10.1371/journal.pone.0305496. eCollection 2024.

Abstract

Narratives posted on the internet by patients contain a vast amount of information about various concerns. This study aimed to extract multiple concerns from interviews with breast cancer patients using the natural language processing (NLP) model bidirectional encoder representations from transformers (BERT). A total of 508 interview transcriptions of breast cancer patients written in Japanese were labeled with five types of concern labels: "treatment," "physical," "psychological," "work/financial," and "family/friends." The labeled texts were used to create a multi-label classifier by fine-tuning a pre-trained BERT model. Prior to fine-tuning, we also created several classifiers with domain adaptation using (1) breast cancer patients' blog articles and (2) breast cancer patients' interview transcriptions. The performance of the classifiers was evaluated in terms of precision through 5-fold cross-validation. The multi-label classifiers with only fine-tuning had precision values of over 0.80 for "physical" and "work/financial" out of the five concerns. On the other hand, precision for "treatment" was low at approximately 0.25. However, for the classifiers using domain adaptation, the precision of this label took a range of 0.40-0.51, with some cases improving by more than 0.2. This study showed combining domain adaptation with a multi-label classifier on target data made it possible to efficiently extract multiple concerns from interviews.

MeSH terms

  • Breast Neoplasms* / psychology
  • Female
  • Humans
  • Narration
  • Natural Language Processing*

Grants and funding

This work was supported by JSPS KAKENHI Grant Number 21H03170 and JST CREST Grant Number JPMJCR22N1, Japan. For more information on JSPS and JST, please visit https://www.jsps.go.jp and https://www.jst.go.jp, respectively. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.