Ethical debates amidst flawed healthcare artificial intelligence metrics

Jack Gallifant; Danielle S Bitterman; Leo Anthony Celi; Judy W Gichoya; Joao Matos; Liam G McCoy; Robin L Pierce

doi:10.1038/s41746-024-01242-1

Ethical debates amidst flawed healthcare artificial intelligence metrics

NPJ Digit Med. 2024 Sep 11;7(1):243. doi: 10.1038/s41746-024-01242-1.

Authors

Jack Gallifant^{1

2}, Danielle S Bitterman^{3

4

5}, Leo Anthony Celi^{6

7

8}, Judy W Gichoya⁹, Joao Matos^{1

10

11}, Liam G McCoy¹², Robin L Pierce¹³

Affiliations

¹ Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA.
² Department of Critical Care, Guy's and St Thomas' NHS Foundation Trust, London, UK.
³ Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA.
⁴ Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA.
⁵ Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
⁶ Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA. lceli@mit.edu.
⁷ Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA. lceli@mit.edu.
⁸ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA. lceli@mit.edu.
⁹ Department of Radiology, Emory University School of Medicine, Georgia, USA.
¹⁰ Faculty of Engineering, University of Porto, Porto, Portugal.
¹¹ Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal.
¹² Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Canada.
¹³ The Law School, Faculty of Humanities, Arts, and Social Sciences, University of Exeter, Exeter, UK.

Abstract

Healthcare AI faces an ethical dilemma between selective and equitable deployment, exacerbated by flawed performance metrics. These metrics inadequately capture real-world complexities and biases, leading to premature assertions of effectiveness. Improved evaluation practices, including continuous monitoring and silent evaluation periods, are crucial. To address these fundamental shortcomings, a paradigm shift in AI assessment is needed, prioritizing actual patient outcomes over conventional benchmarking.

Abstract

Grants and funding