Objective: To adapt and validate an algorithm to ascertain transgender and gender diverse (TGD) patients within electronic health record (EHR) data.
Methods: Using a previously unvalidated algorithm of identifying TGD persons within administrative claims data in a multistep, hierarchical process, we validated this algorithm in an EHR data set with self-reported gender identity.
Results: Within an EHR data set of 52 746 adults with self-reported gender identity (gold standard) a previously unvalidated algorithm to identify TGD persons via TGD-related diagnosis and procedure codes, and gender-affirming hormone therapy prescription data had a sensitivity of 87.3% (95% confidence interval [CI] 86.4-88.2), specificity of 98.7% (95% CI 98.6-98.8), positive predictive value (PPV) of 88.7% (95% CI 87.9-89.4), and negative predictive value (NPV) of 98.5% (95% CI 98.4-98.6). The area under the curve (AUC) was 0.930 (95% CI 0.925-0.935). Steps to further categorize patients as presumably TGD men versus women based on prescription data performed well: sensitivity of 97.6%, specificity of 92.7%, PPV of 93.2%, and NPV of 97.4%. The AUC was 0.95 (95% CI 0.94-0.96).
Conclusions: In the absence of self-reported gender identity data, an algorithm to identify TGD patients in administrative data using TGD-related diagnosis and procedure codes, and gender-affirming hormone prescriptions performs well.
Keywords: diagnosis codes; electronic health record; gender identity; transgender.
© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.