Disease misclassification in electronic healthcare database studies: Deriving validity indices-A contribution from the ADVANCE project

Kaatje Bollaerts; Alexandros Rekkas; Tom De Smedt; Caitlin Dodd; Nick Andrews; Rosa Gini

doi:10.1371/journal.pone.0231333

Disease misclassification in electronic healthcare database studies: Deriving validity indices-A contribution from the ADVANCE project

PLoS One. 2020 Apr 22;15(4):e0231333. doi: 10.1371/journal.pone.0231333. eCollection 2020.

Authors

Kaatje Bollaerts¹, Alexandros Rekkas^{1

2}, Tom De Smedt¹, Caitlin Dodd², Nick Andrews³, Rosa Gini⁴

Affiliations

¹ P95 Epidemiology and Pharmacovigilance, Leuven, Belgium.
² Erasmus Medical Centre Rotterdam, Rotterdam, Netherlands.
³ Statistics, Modelling, and Economics Department, Public Health England, Colindale, London, United Kingdom.
⁴ Agenzia regionale di sanità della Toscana, Florence, Italy.

Abstract

There is a strong and continuously growing interest in using large electronic healthcare databases to study health outcomes and the effects of pharmaceutical products. However, concerns regarding disease misclassification (i.e. classification errors of the disease status) and its impact on the study results are legitimate. Validation is therefore increasingly recognized as an essential component of database research. In this work, we elucidate the interrelations between the true prevalence of a disease in a database population (i.e. prevalence assuming no disease misclassification), the observed prevalence subject to disease misclassification, and the most common validity indices: sensitivity, specificity, positive and negative predictive value. Based on this, we obtained analytical expressions to derive all the validity indices and true prevalence from the observed prevalence and any combination of two other parameters. The analytical expressions can be used for various purposes. Most notably, they can be used to obtain an estimate of the observed prevalence adjusted for outcome misclassification from any combination of two validity indices and to derive validity indices from each other which would otherwise be difficult to obtain. To allow researchers to easily use the analytical expressions, we additionally developed a user-friendly and freely available web-application.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Databases, Factual*
Disease / classification*
Electronic Health Records
Humans
User-Computer Interface

Grants and funding

Finanacial Disclosure: This research was funded by the Innovative Medicines Initiative (IMI) Joint Undertaking through the ADVANCE project [№ 115557]. The IMI is a joint initiative (publicprivate partnership) of the European Commission and the European Federation of Pharmaceutical Industries and Associations (EFPIA) to improve the competitive situation of the European Union in the field of pharmaceutical research. The IMI provided support in the form of salaries for KB, TDS, CD and RG but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. AR and NA did not receive any financial compensation for their contribution to this research. The specific roles of the authors are articulated in the ‘author contributions’ section.