Applying GPT-4 to the Plastic Surgery Inservice Training Examination

J Plast Reconstr Aesthet Surg. 2023 Dec:87:78-82. doi: 10.1016/j.bjps.2023.09.027. Epub 2023 Sep 14.

Abstract

Background: The recent introduction of Generative Pre-trained Transformer (GPT)-4 has demonstrated the potential to be a superior version of ChatGPT-3.5. According to many, GPT-4 is seen as a more reliable and creative version of GPT-3.5.

Objective: In conjugation with our prior manuscript, we wanted to determine if GPT-4 could be exploited as an instrument for plastic surgery graduate medical education by evaluating its performance on the Plastic Surgery Inservice Training Examination (PSITE).

Methods: Sample assessment questions from the 2022 PSITE were obtained from the American Council of Academic Plastic Surgeons website and manually inputted into GPT-4. Responses by GPT-4 were qualified using the properties of natural coherence. Incorrect answers were stratified into the consequent categories: informational, logical, or explicit fallacy.

Results: From a total of 242 questions, GPT-4 provided correct answers for 187, resulting in a 77.3% accuracy rate. Logical reasoning was utilized in 95.0% of questions, internal information in 98.3%, and external information in 97.5%. Upon separating the questions based on incorrect and correct responses, a statistically significant difference was identified in GPT-4's application of logical reasoning.

Conclusion: GPT-4 has shown to be more accurate and reliable for plastic surgery resident education when compared to GPT-3.5. Users should look to utilize the tool to enhance their educational curriculum. Those who adopt the use of such models may be better equipped to deliver high-quality care to their patients.

Keywords: AI; Artificial intelligence; ChatGPT; Resident education.

MeSH terms

  • Curriculum
  • Education, Medical, Graduate
  • Humans
  • Inservice Training
  • Plastic Surgery Procedures*
  • Surgery, Plastic*