Comparison of ChatGPT versions in informing patients with rotator cuff injuries

Ali Eray Günay; Alper Özer; Alparslan Yazıcı; Gökhan Sayer

doi:10.1016/j.jseint.2024.04.016

Comparison of ChatGPT versions in informing patients with rotator cuff injuries

JSES Int. 2024 May 6;8(5):1016-1018. doi: 10.1016/j.jseint.2024.04.016. eCollection 2024 Sep.

Authors

Ali Eray Günay¹, Alper Özer¹, Alparslan Yazıcı², Gökhan Sayer³

Affiliations

¹ Department of Orthopedics and Traumatology, Kayseri City Training and Research Hospital, Kayseri, Turkey.
² Department of Orthopedics and Traumatology, Develi State Hospital, Kayseri, Turkey.
³ Department of Orthopedics and Traumatology, Bursa City Training and Research Hospital, Bursa, Turkey.

Abstract

Background: The aim of this study is to evaluate whether Chat Generative Pretrained Transformer (ChatGPT) can be recommended as a resource for informing patients planning rotator cuff repairs, and to assess the differences between ChatGPT 3.5 and 4.0 versions in terms of information content and readability.

Methods: In August 2023, 13 commonly asked questions by patients with rotator cuff disease were posed to ChatGPT 3.5 and ChatGPT 4 programs using different internet protocol computers by 3 experienced surgeons in rotator cuff surgery. After converting the answers of both versions into text, the quality and readability of the answers were examined.

Results: The average Journal of the American Medical Association score for both versions was 0, and the average DISCERN score was 61.6. A statistically significant and strong correlation was found between ChatGPT 3.5 and 4.0 DISCERN scores. There was excellent agreement in DISCERN scores for both versions among the 3 evaluators. ChatGPT 3.5 was found to be less readable than ChatGPT 4.0.

Conclusion: The information provided by the ChatGPT conversational system was evaluated as of high quality, but there were significant shortcomings in terms of reliability due to the lack of citations. Despite the ChatGPT 4.0 version having higher readability scores, both versions were considered difficult to read.

Keywords: Arthroscopy; Artificial intelligence; ChatGPT; OpenAI; Rotator cuff; Shoulder.