Objectives: To study the value of combining individual- and neighborhood-level sociodemographic data to predict study participation and assess the effects of baseline selection on the distribution of metabolic risk factors and lifestyle factors in the Swedish CardioPulmonary bioImage Study (SCAPIS).
Methods: We linked sociodemographic register data to SCAPIS participants (n = 30,154, ages: 50-64 years) and a random sample of the study's target population (n = 59,909). We assessed the classification ability of participation models based on individual-level data, neighborhood-level data, and combinations of both. Standardized mean differences (SMD) were used to examine how reweighting the sample to match the population affected the averages of 32 cardiopulmonary risk factors at baseline. Absolute SMDs >0.10 were considered meaningful.
Results: Combining both individual-level and neighborhood-level data gave rise to a model with better classification ability (AUC: 71.3%) than models with only individual-level (AUC: 66.9%) or neighborhood-level data (AUC: 65.5%). We observed a greater change in the distribution of risk factors when we reweighted the participants using both individual and area data. The only meaningful change was related to the (self-reported) frequency of alcohol consumption, which appears to be higher in the SCAPIS sample than in the population. The remaining risk factors did not change meaningfully.
Conclusions: Both individual- and neighborhood-level characteristics are informative in assessing study selection effects. Future analyses of cardiopulmonary outcomes in the SCAPIS cohort can benefit from our study, though the average impact of selection on risk factor distributions at baseline appears small.