Ensuring fair, safe, and interpretable artificial intelligence-based prediction tools in a real-world oncological setting

Commun Med (Lond). 2023 Jun 22;3(1):88. doi: 10.1038/s43856-023-00317-6.

Abstract

Background: Cancer patients often experience treatment-related symptoms which, if uncontrolled, may require emergency department admission. We developed models identifying breast or genitourinary cancer patients at the risk of attending emergency department (ED) within 30-days and demonstrated the development, validation, and proactive approach to in-production monitoring of an artificial intelligence-based predictive model during a 3-month simulated deployment at a cancer hospital in the United States.

Methods: We used routinely-collected electronic health record data to develop our predictive models. We evaluated models including a variational autoencoder k-nearest neighbors algorithm (VAE-kNN) and model behaviors with a sample containing 84,138 observations from 28,369 patients. We assessed the model during a 77-day production period exposure to live data using a proactively monitoring process with predefined metrics.

Results: Performance of the VAE-kNN algorithm is exceptional (Area under the receiver-operating characteristics, AUC = 0.80) and remains stable across demographic and disease groups over the production period (AUC 0.74-0.82). We can detect issues in data feeds using our monitoring process to create immediate insights into future model performance.

Conclusions: Our algorithm demonstrates exceptional performance at predicting risk of 30-day ED visits. We confirm that model outputs are equitable and stable over time using a proactive monitoring approach.

Plain language summary

Patients with cancer often need to visit the hospital emergency department (ED), for example due to treatment side effects. Predicting these visits might help us to better manage the treatment of patients who are at risk. Here, we develop a computer-based tool to identify patients with cancer who are at risk of an unplanned ED visit within 30 days. We use health record data from over 28,000 patients who had visited a single cancer hospital in the US to create and test the model. The model performed well and was consistent across different demographic and disease groups. We monitor model behavior over time and show that it is stable. The approach we take to monitoring model performance may be a particularly useful contribution toward implementing similar predictive models in the clinic and checking that they are performing as intended.