Evaluating ChatGPT's competency in radiation oncology: A comprehensive assessment across clinical scenarios

Radiother Oncol. 2024 Nov 19:202:110645. doi: 10.1016/j.radonc.2024.110645. Online ahead of print.

Abstract

Purpose: Artificial intelligence (AI) and machine learning present an opportunity to enhance clinical decision-making in radiation oncology. This study aims to evaluate the competency of ChatGPT, an AI language model, in interpreting clinical scenarios and assessing its oncology knowledge.

Methods and materials: A series of clinical cases were designed covering 12 disease sites. Questions were grouped into domains: epidemiology, staging and workup, clinical management, treatment planning, cancer biology, physics, and surveillance. Royal College-certified radiation oncologists (ROs) reviewed cases and provided solutions. ROs scored responses on 3 criteria: conciseness (focused answers), completeness (addressing all aspects of the question), and correctness (answer aligns with expert opinion) using a standardized rubric. Scores ranged from 0 to 5 for each criterion for a total possible score of 15.

Results: Across 12 cases, 182 questions were answered with a total AI score of 2317/2730 (84 %). Scores by criteria were: completeness (79 %, range: 70-99 %), conciseness (92 %, range: 83-99 %), and correctness (81 %, range: 72-92 %). AI performed best in the domains of epidemiology (93 %) and cancer biology (93 %) and reasonably in staging and workup (89 %), physics (86 %) and surveillance (82 %). Weaker domains included treatment planning (78 %) and clinical management (81 %). Statistical differences were driven by variations in the completeness (p < 0.01) and correctness (p = 0.04) criteria, whereas conciseness scored universally high (p = 0.91). These trends were consistent across disease sites.

Conclusions: ChatGPT showed potential as a tool in radiation oncology, demonstrating a high degree of accuracy in several oncologic domains. However, this study highlights limitations with incorrect and incomplete answers in complex cases.

Keywords: Artificial intelligence; Large language models; Radiation oncology.