Strategies for integrating artificial intelligence into mammography screening programmes: a retrospective simulation analysis

Lancet Digit Health. 2024 Nov;6(11):e803-e814. doi: 10.1016/S2589-7500(24)00173-0.

Abstract

Background: Integrating artificial intelligence (AI) into mammography screening can support radiologists and improve programme metrics, yet the potential of different strategies for integrating the technology remains understudied. We compared programme-level performance metrics of seven AI integration strategies.

Methods: We performed a retrospective comparative evaluation of seven strategies for integrating AI into mammography screening using datasets generated from screening programmes in Germany (n=1 657 068), the UK (n=223 603) and Sweden (n=22 779). The commercially available AI model used was Vara version 2.10, trained from scratch on German data. We simulated the performance of each strategy in terms of cancer detection rate (CDR), recall rate, and workload reduction, and compared the metrics with those of the screening programmes. We also assessed the distribution of the stages and grades of the cancers detected by each strategy and the AI model's ability to correctly localise those cancers.

Findings: Compared with the German screening programme (CDR 6·32 per 1000 examinations, recall rate 4·11 per 100 examinations), replacement of both readers (standalone AI strategy) achieved a non-inferior CDR of 6·37 (95% CI 6·10-6·64) at a recall rate of 3·80 (95% CI 3·67-3·93), whereas single reader replacement achieved a CDR of 6·49 (6·31-6·67), a recall rate of 4·01 (3·92-4·10), and a 49% workload reduction. Programme-level decision referral achieved a CDR of 6·85 (6·61-7·11), a recall rate of 3·55 (3·43-3·68), and an 84% workload reduction. Compared with the UK programme CDR of 8·19, the reader-level, programme-level, and deferral to single reader strategies achieved CDRs of 8·24 (7·82-8·71), 8·59 (8·12-9·06), and 8·28 (7·86-8·71), without increasing recall and while reducing workload by 37%, 81%, and 95%, respectively. On the Swedish dataset, programme-level decision referral increased the CDR by 17·7% without increasing recall and while reducing reading workload by 92%.

Interpretation: The decision referral strategies offered the largest improvements in cancer detection rates and reduction in recall rates, and all strategies except normal triaging showed potential to improve screening metrics.

Funding: Vara.

MeSH terms

  • Aged
  • Artificial Intelligence*
  • Breast Neoplasms* / diagnosis
  • Breast Neoplasms* / diagnostic imaging
  • Computer Simulation
  • Early Detection of Cancer* / methods
  • Female
  • Germany
  • Humans
  • Mammography* / methods
  • Mass Screening / methods
  • Middle Aged
  • Retrospective Studies
  • Sweden
  • United Kingdom
  • Workload