In health services research, it is common to encounter semicontinuous data, characterized by a point mass at zero followed by a right-skewed continuous distribution with positive support. Examples include health expenditures, in which the zeros represent a subpopulation of patients who do not use health services, while the continuous distribution describes the level of expenditures among health services users. Longitudinal semicontinuous data are typically analyzed using two-part random-effect mixtures with one component that models the probability of health services use, and a second component that models the distribution of log-scale positive expenditures among users. However, because the second part conditions on a non-zero response, obtaining interpretable effects of covariates on the combined population of health services users and non-users is not straightforward, even though this is often of greatest interest to investigators. Here, we propose a marginalized two-part model for longitudinal data that allows investigators to obtain the effect of covariates on the overall population mean. The model additionally provides estimates of the overall population mean on the original, untransformed scale, and many covariates take a dual population average and subject-specific interpretation. Using a Bayesian estimation approach, this model maintains the flexibility to include complex random-effect structures and easily estimate functions of the overall mean. We illustrate this approach by evaluating the effect of a copayment increase on health care expenditures in the Veterans Affairs health care system over a four-year period.
Keywords: Semicontinuous data; copayment increase; health care expenditures; log-skew-normal distribution; marginalized models; two-part models.