Under-specification as the source of ambiguity and vagueness in narrative phenotype algorithm definitions

Jingzhi Yu; Jennifer A Pacheco; Anika S Ghosh; Yuan Luo; Chunhua Weng; Ning Shang; Barbara Benoit; David S Carrell; Robert J Carroll; Ozan Dikilitas; Robert R Freimuth; Vivian S Gainer; Hakon Hakonarson; George Hripcsak; Iftikhar J Kullo; Frank Mentch; Shawn N Murphy; Peggy L Peissig; Andrea H Ramirez; Nephi Walton; Wei-Qi Wei; Luke V Rasmussen

doi:10.1186/s12911-022-01759-z

Under-specification as the source of ambiguity and vagueness in narrative phenotype algorithm definitions

BMC Med Inform Decis Mak. 2022 Jan 28;22(1):23. doi: 10.1186/s12911-022-01759-z.

Authors

Jingzhi Yu¹, Jennifer A Pacheco², Anika S Ghosh², Yuan Luo², Chunhua Weng³, Ning Shang³, Barbara Benoit⁴, David S Carrell⁵, Robert J Carroll⁶, Ozan Dikilitas⁷, Robert R Freimuth⁸, Vivian S Gainer⁴, Hakon Hakonarson⁹, George Hripcsak³, Iftikhar J Kullo⁷, Frank Mentch⁹, Shawn N Murphy⁴, Peggy L Peissig¹⁰, Andrea H Ramirez⁶, Nephi Walton¹¹, Wei-Qi Wei⁶, Luke V Rasmussen¹²

Affiliations

¹ Center for Health Information Partnerships (CHIP), Northwestern University Feinberg School of Medicine, 625 N. Michigan Ave, Suite. 1500, Chicago, IL, 60611, USA. k.yu@northwestern.edu.
² Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
³ Department of Biomedical Informatics, Columbia University, New York, NY, USA.
⁴ Research IS and Computing, Massachusetts General Hospital Brigham, Somerville, MA, USA.
⁵ Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA.
⁶ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
⁷ Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA.
⁸ Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
⁹ Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
¹⁰ Biomedical Informatics Research Center, Marshfield Clinic Research Institute, Marshfield, WI, USA.
¹¹ Intermountain Precision Genomics, Intermountain Healthcare, St. George, UT, USA.
¹² Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.

Abstract

Introduction: Currently, one of the commonly used methods for disseminating electronic health record (EHR)-based phenotype algorithms is providing a narrative description of the algorithm logic, often accompanied by flowcharts. A challenge with this mode of dissemination is the potential for under-specification in the algorithm definition, which leads to ambiguity and vagueness.

Methods: This study examines incidents of under-specification that occurred during the implementation of 34 narrative phenotyping algorithms in the electronic Medical Record and Genomics (eMERGE) network. We reviewed the online communication history between algorithm developers and implementers within the Phenotype Knowledge Base (PheKB) platform, where questions could be raised and answered regarding the intended implementation of a phenotype algorithm.

Results: We developed a taxonomy of under-specification categories via an iterative review process between two groups of annotators. Under-specifications that lead to ambiguity and vagueness were consistently found across narrative phenotype algorithms developed by all involved eMERGE sites.

Discussion and conclusion: Our findings highlight that under-specification is an impediment to the accuracy and efficiency of the implementation of current narrative phenotyping algorithms, and we propose approaches for mitigating these issues and improved methods for disseminating EHR phenotyping algorithms.

Keywords: Algorithm: Natural Language; Ambiguity; Electronic Health Records (EHR); Phenotyping; Under-Specification; Vagueness.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms*
Electronic Health Records*
Genomics
Humans
Knowledge Bases
Phenotype

Abstract

Publication types

MeSH terms

Grants and funding