Predicting Clostridioides difficile infection outcomes with explainable machine learning

Gregory R Madden; Rachel H Boone; Emmanuel Lee; Costi D Sifri; William A Petri Jr

doi:10.1016/j.ebiom.2024.105244

Predicting Clostridioides difficile infection outcomes with explainable machine learning

EBioMedicine. 2024 Aug:106:105244. doi: 10.1016/j.ebiom.2024.105244. Epub 2024 Jul 17.

Authors

Gregory R Madden¹, Rachel H Boone², Emmanuel Lee³, Costi D Sifri⁴, William A Petri Jr²

Affiliations

¹ Division of Infectious Diseases & International Health, Department of Medicine, University of Virginia School of Medicine, Charlottesville, VA, USA; Office of Hospital Epidemiology/Infection Prevention & Control, University of Virginia School of Medicine, Charlottesville, VA, USA. Electronic address: grm7q@virginia.edu.
² Department of Microbiology, Immunology, and Cancer Biology, University of Virginia, Charlottesville, VA, USA.
³ University of Virginia School of Medicine, Charlottesville, VA, USA.
⁴ Division of Infectious Diseases & International Health, Department of Medicine, University of Virginia School of Medicine, Charlottesville, VA, USA; Office of Hospital Epidemiology/Infection Prevention & Control, University of Virginia School of Medicine, Charlottesville, VA, USA.

Abstract

Background: Clostridioides difficile infection results in life-threatening short-term outcomes and the potential for subsequent recurrent infection. Predicting these outcomes at diagnosis, when important clinical decisions need to be made, has proven to be a difficult task.

Methods: 52 clinical features from existing models or the literature were collected retrospectively within ±48 h of diagnosis among 1660 inpatient infections. A modified desirability of outcome ranking (DOOR) was designed to encompass clinically-important severe events attributable to the acute infection (intensive care transfer due to sepsis, shock, colectomy/ileostomy, mortality) and/or 60-day recurrence. A deep neural network was constructed and interpreted using SHapley Additive exPlanations (SHAP). High-importance features were used to train a reduced, shallow network and performance was compared to existing conventional models (7 severity, 7 recurrence; after summing DOOR probabilities to align with conventional binary outputs) using area under the ROC curve (AUROC) and DeLong tests.

Findings: The full (52-feature) model achieved an out-of-sample AUROC 0.823 for severity and 0.678 for recurrence. SHAP identified 13 unique, highly-important features (age, hypotension, initial treatment, onset, PCR cycle threshold, number of prior episodes, antibiotic exposure, fever, hypotension, pressors, leukocytosis, creatinine, lactate) that were used to train a reduced model, which performed similarly to the full model (severity AUROC difference P = 0.130; recurrence P = 0.426) and significantly better than the top severity model (reduced model predicting severity 0.837, ATLAS 0.749; P = 0.001). The reduced model also outperformed the top recurrence model, but this was not statistically-significant (reduced model recurrence AUROC 0.653, IDSA Recurrence Risk Criteria 0.595; P = 0.196). The final, reduced model was deployed as a web application with real-time SHAP explanations.

Interpretation: Our final model outperformed existing severity and recurrence models; however, it requires external validation. A DOOR output allows specific clinical questions to be asked with explainable predictions that can be feasibly implemented with limited computing resources.

Funding: National Institutes of Health-Institute of Allergy and Infectious Diseases.

Keywords: Clostridioides difficile infection; Machine learning; Outcome model; Prediction model.

MeSH terms

Aged
Area Under Curve
Clostridioides difficile*
Clostridium Infections* / diagnosis
Clostridium Infections* / microbiology
Female
Humans
Machine Learning*
Male
Middle Aged
Prognosis
ROC Curve
Recurrence
Retrospective Studies

Grants and funding

R01 AI152477/AI/NIAID NIH HHS/United States