Using decision curve analysis to benchmark performance of a magnetic resonance imaging-based deep learning model for prostate cancer risk assessment

Eur Radiol. 2020 Dec;30(12):6867-6876. doi: 10.1007/s00330-020-07030-1. Epub 2020 Jun 26.

Abstract

Objectives: To benchmark the performance of a calibrated 3D convolutional neural network (CNN) applied to multiparametric MRI (mpMRI) for risk assessment of clinically significant prostate cancer (csPCa) using decision curve analysis (DCA).

Methods: We retrospectively analyzed 499 patients who had positive mpMRI (PI-RADSv2 ≥ 3) and MRI-targeted biopsy. The training cohort comprised 449 men, including a calibration set of 50 men. Biopsy decision strategies included using risk estimates from the CNN (original and calibrated), to perform biopsy in men with PI-RADSv2 ≥ 4 only, or additionally in men with PI-RADSv2 3 and PSA density (PSAd) ≥ 0.15 ng/ml/ml. Discrimination, calibration and clinical usefulness in the unseen test cohort (n = 50) were assessed using C-statistic, calibration plots and DCA, respectively.

Results: The calibrated CNN achieved moderate calibration (Hosmer-Lemeshow calibration test, p = 0.41) and good discrimination (C = 0.85). DCA revealed consistently higher net benefit and net reduction in biopsies for the calibrated CNN compared with the original CNN, PI-RADSv2 ≥ 4 and the combined strategy of PI-RADSv2 and PSAd. Original CNN predictions were severely miscalibrated (p < 0.0001) resulting in net harm compared with a 'biopsy all' patients strategy. At-risk thresholds ≥ 10% using the calibrated CNN and the combined strategy reduced the number of biopsies by an estimated 201 and 55 men, respectively, per 1000 men at risk, without missing csPCa, while original CNN and PI-RADSv2 ≥ 4 could not achieve a net reduction in biopsies.

Conclusions: DCA revealed that our calibrated 3D-CNN resulted in fewer unnecessary biopsies compared with using PI-RADSv2 alone or in combination with PSAd. CNN calibration is important in achieving clinical utility.

Key points: • A 3D deep learning model applied to multiparametric MRI may help to prevent unnecessary prostate biopsies in patients eligible for MRI-targeted biopsy. • Owing to miscalibration, original risk estimates by the deep learning model require prior calibration to enable clinical utility. • Decision curve analysis confirmed a net benefit of using our calibrated deep learning model for biopsy decisions compared with alternative strategies, including PI-RADSv2 alone and in combination with prostate-specific antigen density.

Keywords: Artificial intelligence; Decision analysis; Deep Learning; Magnetic resonance imaging; Prostatic neoplasms.

MeSH terms

  • Algorithms
  • Benchmarking
  • Biopsy / methods*
  • Calibration
  • Deep Learning*
  • Humans
  • Image Processing, Computer-Assisted
  • Machine Learning
  • Magnetic Resonance Imaging*
  • Male
  • Normal Distribution
  • Observer Variation
  • Prostate-Specific Antigen / blood
  • Prostatic Neoplasms / diagnostic imaging*
  • Prostatic Neoplasms / pathology
  • Retrospective Studies
  • Risk Assessment / methods*

Substances

  • Prostate-Specific Antigen