The promise of AI to alleviate the burdens of grading and potentially enhance writing instruction is an exciting prospect. However, we believe it is crucial to emphasize that the accuracy of AI is only one component of its responsible use in education. Various governmental agencies, such as NIST in the US, and non-governmental agencies like the UN, UNESCO, and OECD have published guidance on the responsible use of AI, which we have synthesized to come up with our principles for the responsible use of AI in assessments at ETS. Our principles include fairness and bias mitigation; privacy & security; transparency, explainability, and accountability; educational impact & integrity; and continuous improvement. The accuracy of AI-scoring is one component of our principles related to educational impact & integrity. In this work, we share our thoughts on fairness & bias mitigation, and transparency & explainability. We demonstrate an empirical evaluation of zero-shot scoring using GTP-4o, with an emphasis on fairness evaluations and explainability of these automated scoring models.
Keywords: AI; Educational Measurement; Explainability; Fairness; Scoring.
© 2024. Educational Testing Service 2024.