Is the Power Threshold of 0.8 Applicable to Surgical Science?-Empowering the Underpowered Study

Yanik J Bababekov; Ya-Ching Hung; Yu-Tien Hsu; Brooks V Udelsman; Jessica L Mueller; Hsu-Ying Lin; Sahael M Stapleton; David C Chang

doi:10.1016/j.jss.2019.03.062

Is the Power Threshold of 0.8 Applicable to Surgical Science?-Empowering the Underpowered Study

J Surg Res. 2019 Sep:241:235-239. doi: 10.1016/j.jss.2019.03.062. Epub 2019 Apr 28.

Authors

Yanik J Bababekov¹, Ya-Ching Hung², Yu-Tien Hsu², Brooks V Udelsman², Jessica L Mueller², Hsu-Ying Lin², Sahael M Stapleton², David C Chang²

Affiliations

¹ Department of Surgery, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts. Electronic address: ybababekov@partners.org.
² Department of Surgery, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts.

PMID: 31035137
DOI: 10.1016/j.jss.2019.03.062

Abstract

Background: Many articles in the surgical literature were faulted for committing type 2 error, or concluding no difference when the study was "underpowered". However, it is unknown if the current power standard of 0.8 is reasonable in surgical science.

Methods: PubMed was searched for abstracts published in Surgery, JAMA Surgery, and Annals of Surgery and from January 1, 2012 to December 31, 2016, with Medical Subject Heading terms of randomized controlled trial (RCT) or observational study (OBS) and limited to humans were included (n = 403). Articles were excluded if all reported findings were statistically significant (n = 193), or if presented data were insufficient to calculate power (n = 141).

Results: A total of 69 manuscripts (59 RCTs and 10 OBSs) were assessed. Overall, the median power was 0.16 (interquartile range [IQR] 0.08-0.32). The median power was 0.16 for RCTs (IQR 0.08-0.32) and 0.14 for OBSs (IQR 0.09-0.22). Only 4 studies (5.8%) reached or exceeded the current 0.8 standard. Two-thirds of our study sample had an a priori power calculation (n = 41).

Conclusions: High-impact surgical science was routinely unable to reach the arbitrary power standard of 0.8. The academic surgical community should reconsider the power threshold as it applies to surgical investigations. We contend that the blueprint for the redesign should include benchmarking the power of articles on a gradient scale, instead of aiming for an unreasonable threshold.

Keywords: Health services research; Innovation; Negative study; Power; Surgical science; Type 2 error.

MeSH terms

Data Interpretation, Statistical
Humans
Randomized Controlled Trials as Topic / standards*
Randomized Controlled Trials as Topic / statistics & numerical data
Research Design / standards*
Research Design / statistics & numerical data
Sample Size
Specialties, Surgical*