Examining the responsible use of zero-shot AI approaches to scoring essays

Matthew Johnson; Mo Zhang

doi:10.1038/s41598-024-79208-2

Examining the responsible use of zero-shot AI approaches to scoring essays

Sci Rep. 2024 Dec 3;14(1):30064. doi: 10.1038/s41598-024-79208-2.

Authors

Matthew Johnson¹, Mo Zhang²

Affiliations

¹ Educational Testing Service, Research Division, 08541, Princeton, New Jersey, USA. msjohnson@ets.org.
² Educational Testing Service, Research Division, 08541, Princeton, New Jersey, USA.

Abstract

The promise of AI to alleviate the burdens of grading and potentially enhance writing instruction is an exciting prospect. However, we believe it is crucial to emphasize that the accuracy of AI is only one component of its responsible use in education. Various governmental agencies, such as NIST in the US, and non-governmental agencies like the UN, UNESCO, and OECD have published guidance on the responsible use of AI, which we have synthesized to come up with our principles for the responsible use of AI in assessments at ETS. Our principles include fairness and bias mitigation; privacy & security; transparency, explainability, and accountability; educational impact & integrity; and continuous improvement. The accuracy of AI-scoring is one component of our principles related to educational impact & integrity. In this work, we share our thoughts on fairness & bias mitigation, and transparency & explainability. We demonstrate an empirical evaluation of zero-shot scoring using GTP-4o, with an emphasis on fairness evaluations and explainability of these automated scoring models.

Keywords: AI; Educational Measurement; Explainability; Fairness; Scoring.