Statistics of Generative AI & Non-Generative Predictive Analytics Machine Learning in Medicine

Mod Pathol. 2024 Nov 21:100663. doi: 10.1016/j.modpat.2024.100663. Online ahead of print.

Abstract

The rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) in medicine has prompted medical professionals to increasingly familiarize themselves with related topics. This also demands grasping the underlying statistical principles that govern their design, validation, and reproducibility. Uniquely, the practice of pathology and medicine produces vast amount of data that can be exploited by AI/ML. The emergence of generative AI, especially in the area of large language models and multimodal frameworks, represents approaches that are starting to transform medicine. Fundamentally, generative and traditional (e.g., non-generative predictive analytics) ML techniques rely on certain common statistical measures to function. However, unique to generative AI are metrics such as, but not limited to, perplexity and BiLingual Evaluation Understudy (BLEU) score that provide a means to determine the quality of generated samples that are typically unfamiliar to most medical practitioners. In contrast, non-generative predictive analytics ML often employs more familiar metrics tailored to specific tasks as seen in the typical classification (i.e., confusion metrics measures such as accuracy, sensitivity, F1-score, ROC-AUC, etc.) or regression studies (i.e., Root mean Square Error [RMSE], R-squared, etc.). To this end, the goal of this review article (as part 4 of our AI review series) is to provide an overview and comparative measure of statistical measures and methodologies employed in both generative AI and traditional (i.e., non-generative predictive analytics) ML fields, along with their strengths and known limitations. By understanding their similarities and differences along with their respective applications, we will become better stewards of this transformative space which ultimately enables us to better address our current and future needs and challenges in a more responsible and scientifically sound manner.

Keywords: BLEU; F1-score; PR (Precision-Recall) curve; ROC-AUC; accuracy; perplexity; precision; recall.

Publication types

  • Review