Predicting Clostridioides difficile infection outcomes with explainable machine learning

EBioMedicine. 2024 Aug:106:105244. doi: 10.1016/j.ebiom.2024.105244. Epub 2024 Jul 17.

Abstract

Background: Clostridioides difficile infection results in life-threatening short-term outcomes and the potential for subsequent recurrent infection. Predicting these outcomes at diagnosis, when important clinical decisions need to be made, has proven to be a difficult task.

Methods: 52 clinical features from existing models or the literature were collected retrospectively within ±48 h of diagnosis among 1660 inpatient infections. A modified desirability of outcome ranking (DOOR) was designed to encompass clinically-important severe events attributable to the acute infection (intensive care transfer due to sepsis, shock, colectomy/ileostomy, mortality) and/or 60-day recurrence. A deep neural network was constructed and interpreted using SHapley Additive exPlanations (SHAP). High-importance features were used to train a reduced, shallow network and performance was compared to existing conventional models (7 severity, 7 recurrence; after summing DOOR probabilities to align with conventional binary outputs) using area under the ROC curve (AUROC) and DeLong tests.

Findings: The full (52-feature) model achieved an out-of-sample AUROC 0.823 for severity and 0.678 for recurrence. SHAP identified 13 unique, highly-important features (age, hypotension, initial treatment, onset, PCR cycle threshold, number of prior episodes, antibiotic exposure, fever, hypotension, pressors, leukocytosis, creatinine, lactate) that were used to train a reduced model, which performed similarly to the full model (severity AUROC difference P = 0.130; recurrence P = 0.426) and significantly better than the top severity model (reduced model predicting severity 0.837, ATLAS 0.749; P = 0.001). The reduced model also outperformed the top recurrence model, but this was not statistically-significant (reduced model recurrence AUROC 0.653, IDSA Recurrence Risk Criteria 0.595; P = 0.196). The final, reduced model was deployed as a web application with real-time SHAP explanations.

Interpretation: Our final model outperformed existing severity and recurrence models; however, it requires external validation. A DOOR output allows specific clinical questions to be asked with explainable predictions that can be feasibly implemented with limited computing resources.

Funding: National Institutes of Health-Institute of Allergy and Infectious Diseases.

Keywords: Clostridioides difficile infection; Machine learning; Outcome model; Prediction model.

MeSH terms

  • Aged
  • Area Under Curve
  • Clostridioides difficile*
  • Clostridium Infections* / diagnosis
  • Clostridium Infections* / microbiology
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Prognosis
  • ROC Curve
  • Recurrence
  • Retrospective Studies