Rationale: After the sample size of a randomized clinical trial (RCT) is set by the power requirement of its primary endpoint, investigators select secondary endpoints while unable to further adjust sample size. How the sensitivity and specificity of an instrument used to measure these outcomes, together with their expected underlying event rates, affect an RCT's power to measure significant differences in these outcomes is poorly understood.
Objectives: Motivated by the design of an RCT of neuromuscular blockade in acute respiratory distress syndrome, we examined how power to detect a difference in secondary endpoints varies with the sensitivity and specificity of the instrument used to measure such outcomes.
Methods: We derived a general formula and Stata code for calculating an RCT's power to detect differences in binary outcomes when such outcomes are measured with imperfect sensitivity and specificity. The formula informed the choice of instrument for measuring post-traumatic stress-like symptoms in the Reevaluation of Systemic Early Neuromuscular Blockade RCT ( www.clinicaltrials.gov identifier NCT02509078).
Measurements and main results: On the basis of published sensitivities and specificities, the Impact of Events Scale-Revised was predicted to measure a 36% symptom rate, whereas the Post-Traumatic Stress Symptoms instrument was predicted to measure a 23% rate, if the true underlying rate of post-traumatic stress symptoms were 25%. Despite its lower sensitivity, the briefer Post-Traumatic Stress Symptoms instrument provided superior power to detect a difference in rates between trial arms, owing to its higher specificity.
Conclusions: Examining instruments' power to detect differences in outcomes may guide their selection when multiple instruments exist, each with different sensitivities and specificities.
Keywords: bias; clinical trials; critical care outcomes; sensitivity; specificity.