Background: This study assessed whether deep learning applied to routine outpatient chest X-rays (CXRs) can identify individuals at high risk for incident chronic obstructive pulmonary disease (COPD).
Methods: Using cancer screening trial data, we previously developed a convolutional neural network (CXR-Lung-Risk) to predict lung-related mortality from a CXR image. In this study, we externally validated CXR-Lung-Risk to predict incident COPD from routine CXRs. We identified outpatients without lung cancer, COPD, or emphysema who had a CXR taken from 2013-2014 at a Mass General Brigham site in Boston, Massachusetts. The primary outcome was 6-year incident COPD. Discrimination was assessed using AUC compared to the TargetCOPD clinical risk score. All analyses were stratified by smoking status. A secondary analysis was conducted in the Project Baseline Health Study (PBHS) to test associations between CXR-Lung-Risk with pulmonary function and protein abundance.
Findings: The primary analysis consisted of 12,550 ever-smokers (mean age 62·4±6·8 years, 48.9% male, 12.4% rate of 6-year COPD) and 15,298 never-smokers (mean age 63·0±8·1 years, 42.8% male, 3.8% rate of 6-year COPD). CXR-Lung-Risk had additive predictive value beyond the TargetCOPD score for 6-year incident COPD in both ever-smokers (CXR-Lung-Risk + TargetCOPD AUC: 0·73 [95% CI: 0·72-0·74] vs. TargetCOPD alone AUC: 0·66 [0·65-0·68], p<0·01) and never-smokers (CXR-Lung-Risk + TargetCOPD AUC: 0·70 [0·67-0·72] vs. TargetCOPD AUC: 0·60 [0·57-0·62], p<0·01). In secondary analyses of 2,097 individuals in the PBHS, CXR-Lung-Risk was associated with worse pulmonary function and with abundance of SCGB3A2 (secretoglobin family 3A member 2) and LYZ (lysozyme), proteins involved in pulmonary physiology.
Interpretation: In external validation, a deep learning model applied to a routine CXR image identified individuals at high risk for incident COPD, beyond known risk factors.
Funding: The Project Baseline Health Study and this analysis were funded by Verily Life Sciences, San Francisco, California.
Clinicaltrialsgov identifier: NCT03154346.