Moving beyond the classic difference-in-differences model: a simulation study comparing statistical methods for estimating effectiveness of state-level policies

Beth Ann Griffin; Megan S Schuler; Elizabeth A Stuart; Stephen Patrick; Elizabeth McNeer; Rosanna Smart; David Powell; Bradley D Stein; Terry L Schell; Rosalie Liccardo Pacula

doi:10.1186/s12874-021-01471-y

Moving beyond the classic difference-in-differences model: a simulation study comparing statistical methods for estimating effectiveness of state-level policies

BMC Med Res Methodol. 2021 Dec 13;21(1):279. doi: 10.1186/s12874-021-01471-y.

Authors

Affiliations

¹ RAND Corporation, 1200 South Hayes Street, Arlington, VA, 22202, USA. bethg@rand.org.
² RAND Corporation, 1200 South Hayes Street, Arlington, VA, 22202, USA.
³ Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA.
⁴ Vanderbilt University Medical Center and School of Medicine, Nashville, TN, 37232, USA.
⁵ RAND Corporation, Santa Monica, CA, 90401, USA.
⁶ RAND Corporation, Pittsburgh, PA, 15213, USA.
⁷ University of Southern California, Los Angeles, CA, 90089, USA.

Abstract

Background: Reliable evaluations of state-level policies are essential for identifying effective policies and informing policymakers' decisions. State-level policy evaluations commonly use a difference-in-differences (DID) study design; yet within this framework, statistical model specification varies notably across studies. More guidance is needed about which set of statistical models perform best when estimating how state-level policies affect outcomes.

Methods: Motivated by applied state-level opioid policy evaluations, we implemented an extensive simulation study to compare the statistical performance of multiple variations of the two-way fixed effect models traditionally used for DID under a range of simulation conditions. We also explored the performance of autoregressive (AR) and GEE models. We simulated policy effects on annual state-level opioid mortality rates and assessed statistical performance using various metrics, including directional bias, magnitude bias, and root mean squared error. We also reported Type I error rates and the rate of correctly rejecting the null hypothesis (e.g., power), given the prevalence of frequentist null hypothesis significance testing in the applied literature.

Results: Most linear models resulted in minimal bias. However, non-linear models and population-weighted versions of classic linear two-way fixed effect and linear GEE models yielded considerable bias (60 to 160%). Further, root mean square error was minimized by linear AR models when we examined crude mortality rates and by negative binomial models when we examined raw death counts. In the context of frequentist hypothesis testing, many models yielded high Type I error rates and very low rates of correctly rejecting the null hypothesis (< 10%), raising concerns of spurious conclusions about policy effectiveness in the opioid literature. When considering performance across models, the linear AR models were optimal in terms of directional bias, root mean squared error, Type I error, and correct rejection rates.

Conclusions: The findings highlight notable limitations of commonly used statistical models for DID designs, which are widely used in opioid policy studies and in state policy evaluations more broadly. In contrast, the optimal model we identified--the AR model--is rarely used in state policy evaluation. We urge applied researchers to move beyond the classic DID paradigm and adopt use of AR models.

Keywords: Difference-in-differences; Opioid; Overdose; Policy evaluations; Simulation; State-level policy.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Analgesics, Opioid*
Computer Simulation
Humans
Linear Models
Models, Statistical*
Policy

Substances

Analgesics, Opioid

Grants and funding

P50 DA046351/DA/NIDA NIH HHS/United States