Generalizability of machine learning in predicting antimicrobial resistance in E. coli: a multi-country case study in Africa

BMC Genomics. 2024 Mar 18;25(1):287. doi: 10.1186/s12864-024-10214-4.

Abstract

Background: Antimicrobial resistance (AMR) remains a significant global health threat particularly impacting low- and middle-income countries (LMICs). These regions often grapple with limited healthcare resources and access to advanced diagnostic tools. Consequently, there is a pressing need for innovative approaches that can enhance AMR surveillance and management. Machine learning (ML) though underutilized in these settings, presents a promising avenue. This study leverages ML models trained on whole-genome sequencing data from England, where such data is more readily available, to predict AMR in E. coli, targeting key antibiotics such as ciprofloxacin, ampicillin, and cefotaxime. A crucial part of our work involved the validation of these models using an independent dataset from Africa, specifically from Uganda, Nigeria, and Tanzania, to ascertain their applicability and effectiveness in LMICs.

Results: Model performance varied across antibiotics. The Support Vector Machine excelled in predicting ciprofloxacin resistance (87% accuracy, F1 Score: 0.57), Light Gradient Boosting Machine for cefotaxime (92% accuracy, F1 Score: 0.42), and Gradient Boosting for ampicillin (58% accuracy, F1 Score: 0.66). In validation with data from Africa, Logistic Regression showed high accuracy for ampicillin (94%, F1 Score: 0.97), while Random Forest and Light Gradient Boosting Machine were effective for ciprofloxacin (50% accuracy, F1 Score: 0.56) and cefotaxime (45% accuracy, F1 Score:0.54), respectively. Key mutations associated with AMR were identified for these antibiotics.

Conclusion: As the threat of AMR continues to rise, the successful application of these models, particularly on genomic datasets from LMICs, signals a promising avenue for improving AMR prediction to support large AMR surveillance programs. This work thus not only expands our current understanding of the genetic underpinnings of AMR but also provides a robust methodological framework that can guide future research and applications in the fight against AMR.

Keywords: E. coli; Africa; Antimicrobial resistance; Machine learning; Whole-genome sequencing.

MeSH terms

  • Ampicillin
  • Anti-Bacterial Agents* / pharmacology
  • Anti-Bacterial Agents* / therapeutic use
  • Cefotaxime
  • Ciprofloxacin / pharmacology
  • Ciprofloxacin / therapeutic use
  • Drug Resistance, Bacterial / genetics
  • Escherichia coli* / genetics
  • Machine Learning
  • Nigeria

Substances

  • Anti-Bacterial Agents
  • Ciprofloxacin
  • Ampicillin
  • Cefotaxime