Experts' prediction of item difficulty of multiple-choice questions in the Ethiopian Undergraduate Medicine Licensure Examination

Shewatatek Gedamu Wonde; Tefera Tadesse; Belay Moges; Stefan K Schauber

doi:10.1186/s12909-024-06012-x

Experts' prediction of item difficulty of multiple-choice questions in the Ethiopian Undergraduate Medicine Licensure Examination

BMC Med Educ. 2024 Sep 16;24(1):1016. doi: 10.1186/s12909-024-06012-x.

Authors

Shewatatek Gedamu Wonde^{1

2}, Tefera Tadesse^{3

4}, Belay Moges⁵, Stefan K Schauber⁶

Affiliations

¹ Institute of Health, Jimma University, Jimma, Ethiopia. gedamuwonde@gmail.com.
² Faculty of Medicine, Institute of Health and Society, University of Oslo, Oslo, Norway. gedamuwonde@gmail.com.
³ Institute of Educational Research (IER), Addis Ababa University, Addis Ababa, Ethiopia.
⁴ Educational Development and Quality Center, University of Global Health Equity, Kigali, Rwanda.
⁵ Institute of Education and Behavioural Sciences, Department of Psychology, Dilla University, Dila, Ethiopia.
⁶ Faculty of Educational Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway.

Abstract

Background: The ability of an expert's item difficulty ratings to predict test-taker actual performance is an important aspect of licensure examinations. Expert judgment is used as a primary source of information for users to make prior decisions to determine the pass rate of test takers. The nature of raters involved in predicting item difficulty is central to set credible standards. Therefore, this study aimed to assess and compare raters' prediction and actual Multiple-Choice Questions' difficulty of the undergraduate medicine licensure examination (UGMLE) in Ethiopia.

Method: 815 examinees' responses to 200 Multiple-Choice Questions (MCQs) were used in this study. The study also included experts' item difficulty ratings of seven physicians who participated in the standard settings of UGMLE. Then, analysis was conducted to understand experts' rating variation in predicting the actual difficulty levels of examinees. Descriptive statistics was used to profile the mean rater's and actual difficulty value for MCQs, and ANOVA was used to compare the mean differences between raters' prediction of item difficulty. Additionally, regression analysis was used to understand the interrater variations in item difficulty predictions compared to the actual difficulty. The proportion of variance of actual difficulty explained from rater prediction was computed using regression analysis.

Results: In this study, the mean difference between raters' prediction and examinees' actual performance was inconsistent across the exam domains. The study revealed a statistically significant strong positive correlation between the actual and predicted item difficulty in exam domains eight and eleven. However, a non-statistically significant very weak positive correlation was reported in exam domains seven and twelve. The multiple comparison analysis showed significant differences in mean item difficulty ratings between raters. In the regression analysis, experts' item difficulty ratings of the UGMLE had 33% power in predicting the actual difficulty level. The regression model also showed a moderate positive correlation (R = 0.57) that was statistically significant at F (6, 193) = 15.58, P = 0.001.

Conclusion: This study demonstrated the complex process for assessing the difficulty level of MCQs in the UGMLE and emphasized the benefits of using experts' ratings in advance. To ensure the exams maintain the necessary reliable and valid scores, raters' accuracy on the UGMLE must be improved. To achieve this, techniques that align with the evolving assessment methodologies must be developed.

Keywords: Ethiopia; Expert judgment; Licensure examination; Undergraduate medicine.

MeSH terms

Adult
Clinical Competence / standards
Education, Medical, Undergraduate* / standards
Educational Measurement* / methods
Educational Measurement* / standards
Ethiopia
Female
Humans
Licensure, Medical* / standards
Male
Students, Medical