Enhancements in artificial intelligence for medical examinations: A leap from ChatGPT 3.5 to ChatGPT 4.0 in the FRCS trauma & orthopaedics examination

Surgeon. 2024 Nov 28:S1479-666X(24)00150-1. doi: 10.1016/j.surge.2024.11.008. Online ahead of print.

Abstract

Introduction: ChatGPT is a sophisticated AI model capable of generating human-like text based on the input it receives. ChatGPT 3.5 showed an inability to pass the FRCS (Tr&Orth) examination due to a lack of higher-order judgement in previous studies. Enhancements in ChatGPT 4.0 warrant an evaluation of its performance.

Methodology: Questions from the UK-based December 2022 In-Training examination were input into ChatGPT 3.5 and 4.0. Methodology from a prior study was replicated to maintain consistency, allowing for a direct comparison between the two model versions. The performance threshold remained at 65.8 %, aligning with the November 2022 sitting of Section 1 of the FRCS (Tr&Orth).

Results: ChatGPT 4.0 achieved a passing score (73.9 %), indicating an improvement in its ability to analyse clinical information and make decisions reflective of a competent trauma and orthopaedic consultant. Compared to ChatGPT 4.0, version 3.5 scored 38.1 % lower, which represents a significant difference (p < 0.0001; Chi-square). The breakdown by subspecialty further demonstrated version 4.0's enhanced understanding and application in complex clinical scenarios. ChatGPT 4.0 had a significantly significant improvement in answering image-based questions (p = 0.0069) compared to its predecessor.

Conclusion: ChatGPT 4.0's success in passing Section One of the FRCS (Tr&Orth) examination highlights the rapid evolution of AI technologies and their potential applications in healthcare and education.

Keywords: Artificial intelligence; ChatGPT; FRCS; Medical education; Trauma & orthopaedics.