Introduction: Planning a high-stake clinical examination requires the evaluation of several psychometric and logistical variables. The authors conducted generalisability and decision studies to answer the following research questions in the context of the surgical long case: (1) Does the addition of a third examiner have any added benefit, vis-à-vis reliability, to the examination? (2) Is global marking more reliable than an itemised marking template? (3) What would be the impact on reliability if there was a reduction in the number of examinees that each panel of examiners is required to assess?
Methods: A third examiner and global marking were introduced. Separate generalisability and decision studies were carried out for both the two- and three-examiner models as well as for itemised and global scores.
Results: The introduction of a third examiner resulted in a modest gain of reliability by 0.05-0.07. Gain in reliability was higher when each candidate was allowed to undertake a higher number of clinical cases. Both the global and itemised scores provided equivalent reliability (generalisability coefficient 0.74-0.89).
Conclusion: Our results showed that only a modest improvement in reliability of the surgical long case is achieved through the introduction of an additional examiner. Although the reliability of global scoring and the itemised marking template was comparable, the latter may provide opportunities for individualised feedback to examinees.