Digital Ink and Surgical Dreams: Perceptions of Artificial Intelligence-Generated Essays in Residency Applications

J Surg Res. 2024 Sep:301:504-511. doi: 10.1016/j.jss.2024.06.020. Epub 2024 Jul 22.

Abstract

Introduction: Large language models like Chat Generative Pre-Trained Transformer (ChatGPT) are increasingly used in academic writing. Faculty may consider use of artificial intelligence (AI)-generated responses a form of cheating. We sought to determine whether general surgery residency faculty could detect AI versus human-written responses to a text prompt; hypothesizing that faculty would not be able to reliably differentiate AI versus human-written responses.

Methods: Ten essays were generated using a text prompt, "Tell us in 1-2 paragraphs why you are considering the University of Rochester for General Surgery residency" (Current trainees: n = 5, ChatGPT: n = 5). Ten blinded faculty reviewers rated essays (ten-point Likert scale) on the following criteria: desire to interview, relevance to the general surgery residency, overall impression, and AI- or human-generated; with scores and identification error rates compared between the groups.

Results: There were no differences between groups for %total points (ChatGPT 66.0 ± 13.5%, human 70.0 ± 23.0%, P = 0.508) or identification error rates (ChatGPT 40.0 ± 35.0%, human 20.0 ± 30.0%, P = 0.175). Except for one, all essays were identified incorrectly by at least two reviewers. Essays identified as human-generated received higher overall impression scores (area under the curve: 0.82 ± 0.04, P < 0.01).

Conclusions: Whether use of AI tools for academic purposes should constitute academic dishonesty is controversial. We demonstrate that human and AI-generated essays are similar in quality, but there is bias against presumed AI-generated essays. Faculty are not able to reliably differentiate human from AI-generated essays, thus bias may be misdirected. AI-tools are becoming ubiquitous and their use is not easily detected. Faculty must expect these tools to play increasing roles in medical education.

Keywords: Ethics; Generative artificial intelligence; Large language model; Surgical education.

MeSH terms

  • Artificial Intelligence*
  • Faculty, Medical / psychology
  • General Surgery* / education
  • Humans
  • Internship and Residency* / methods
  • Writing