Objectives: The objective of this study was to use natural language processing (NLP) as a supplement to International Classification of Diseases, Ninth Revision (ICD-9) and laboratory values in an automated algorithm to better define and risk-stratify patients with cirrhosis.
Background: Identification of patients with cirrhosis by manual data collection is time-intensive and laborious, whereas using ICD-9 codes can be inaccurate. NLP, a novel computerized approach to analyzing electronic free text, has been used to automatically identify patient cohorts with gastrointestinal pathologies such as inflammatory bowel disease. This methodology has not yet been used in cirrhosis.
Study design: This retrospective cohort study was conducted at the University of California, Los Angeles Health, an academic medical center. A total of 5343 University of California, Los Angeles primary care patients with ICD-9 codes for chronic liver disease were identified during March 2013 to January 2015. An algorithm incorporating NLP of radiology reports, ICD-9 codes, and laboratory data determined whether these patients had cirrhosis. Of the 5343 patients, 168 patient charts were manually reviewed at random as a gold standard comparison. Positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity of the algorithm and each of its steps were calculated.
Results: The algorithm's PPV, NPV, sensitivity, and specificity were 91.78%, 96.84%, 95.71%, and 93.88%, respectively. The NLP portion was the most important component of the algorithm with PPV, NPV, sensitivity, and specificity of 98.44%, 93.27%, 90.00%, and 98.98%, respectively.
Conclusions: NLP is a powerful tool that can be combined with administrative and laboratory data to identify patients with cirrhosis within a population.