Statistics of Generative AI & Non-Generative Predictive Analytics Machine Learning in Medicine

Hooman H Rashidi; Bo Hu; Joshua Pantanowitz; Nam Tran; Silvia Liu; Alireza Chamanzar; Mert Gur; Chung-Chou H Chang; Yanshan Wang; Ahmad Tafti; Liron Pantanowitz; Matthew G Hanna

doi:10.1016/j.modpat.2024.100663

Statistics of Generative AI & Non-Generative Predictive Analytics Machine Learning in Medicine

Mod Pathol. 2024 Nov 21:100663. doi: 10.1016/j.modpat.2024.100663. Online ahead of print.

Authors

Hooman H Rashidi¹, Bo Hu², Joshua Pantanowitz³, Nam Tran⁴, Silvia Liu⁵, Alireza Chamanzar⁶, Mert Gur⁷, Chung-Chou H Chang⁸, Yanshan Wang⁹, Ahmad Tafti⁹, Liron Pantanowitz¹⁰, Matthew G Hanna¹¹

Affiliations

¹ Department of Pathology, University of Pittsburgh Medical Center, PA, USA; Computational Pathology and AI Center of Excellence (CPACE), University of Pittsburgh School of Medicine, Pittsburgh, PA, USA. Electronic address: rashidihh@upmc.edu.
² Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH.
³ School of Medicine, University of Pittsburgh, PA, USA.
⁴ Department of Pathology, UC Davis, School of Medicine, Sacramento, CA.
⁵ Department of Pathology, University of Pittsburgh Medical Center, PA, USA.
⁶ Electrical and Computer Engineering Department, Carnegie Mellon University; Computational Pathology and AI Center of Excellence (CPACE), University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
⁷ Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Department of Mechanical Engineering, Istanbul Technical University, Istanbul, 34437, Turkey.
⁸ Department of Medicine and Biostatistics, University of Pittsburgh School of Medicine, PA, USA.
⁹ Department of Health Information Management, University of Pittsburgh; Computational Pathology and AI Center of Excellence (CPACE), University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
¹⁰ Department of Pathology, University of Pittsburgh Medical Center, PA, USA; Computational Pathology and AI Center of Excellence (CPACE), University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
¹¹ Department of Pathology, University of Pittsburgh Medical Center, PA, USA; Computational Pathology and AI Center of Excellence (CPACE), University of Pittsburgh School of Medicine, Pittsburgh, PA, USA. Electronic address: hannamg3@upmc.edu.

PMID: 39579984
DOI: 10.1016/j.modpat.2024.100663

Abstract

The rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) in medicine has prompted medical professionals to increasingly familiarize themselves with related topics. This also demands grasping the underlying statistical principles that govern their design, validation, and reproducibility. Uniquely, the practice of pathology and medicine produces vast amount of data that can be exploited by AI/ML. The emergence of generative AI, especially in the area of large language models and multimodal frameworks, represents approaches that are starting to transform medicine. Fundamentally, generative and traditional (e.g., non-generative predictive analytics) ML techniques rely on certain common statistical measures to function. However, unique to generative AI are metrics such as, but not limited to, perplexity and BiLingual Evaluation Understudy (BLEU) score that provide a means to determine the quality of generated samples that are typically unfamiliar to most medical practitioners. In contrast, non-generative predictive analytics ML often employs more familiar metrics tailored to specific tasks as seen in the typical classification (i.e., confusion metrics measures such as accuracy, sensitivity, F1-score, ROC-AUC, etc.) or regression studies (i.e., Root mean Square Error [RMSE], R-squared, etc.). To this end, the goal of this review article (as part 4 of our AI review series) is to provide an overview and comparative measure of statistical measures and methodologies employed in both generative AI and traditional (i.e., non-generative predictive analytics) ML fields, along with their strengths and known limitations. By understanding their similarities and differences along with their respective applications, we will become better stewards of this transformative space which ultimately enables us to better address our current and future needs and challenges in a more responsible and scientifically sound manner.

Keywords: BLEU; F1-score; PR (Precision-Recall) curve; ROC-AUC; accuracy; perplexity; precision; recall.

Publication types

Review