Background: Many articles in the surgical literature were faulted for committing type 2 error, or concluding no difference when the study was "underpowered". However, it is unknown if the current power standard of 0.8 is reasonable in surgical science.
Methods: PubMed was searched for abstracts published in Surgery, JAMA Surgery, and Annals of Surgery and from January 1, 2012 to December 31, 2016, with Medical Subject Heading terms of randomized controlled trial (RCT) or observational study (OBS) and limited to humans were included (n = 403). Articles were excluded if all reported findings were statistically significant (n = 193), or if presented data were insufficient to calculate power (n = 141).
Results: A total of 69 manuscripts (59 RCTs and 10 OBSs) were assessed. Overall, the median power was 0.16 (interquartile range [IQR] 0.08-0.32). The median power was 0.16 for RCTs (IQR 0.08-0.32) and 0.14 for OBSs (IQR 0.09-0.22). Only 4 studies (5.8%) reached or exceeded the current 0.8 standard. Two-thirds of our study sample had an a priori power calculation (n = 41).
Conclusions: High-impact surgical science was routinely unable to reach the arbitrary power standard of 0.8. The academic surgical community should reconsider the power threshold as it applies to surgical investigations. We contend that the blueprint for the redesign should include benchmarking the power of articles on a gradient scale, instead of aiming for an unreasonable threshold.
Keywords: Health services research; Innovation; Negative study; Power; Surgical science; Type 2 error.
Copyright © 2019 Elsevier Inc. All rights reserved.