[Simulation on design-based and model-based methods in descriptive analysis of complex samples]

Zhonghua Yu Fang Yi Xue Za Zhi. 2015 Jan;49(1):50-5.
[Article in Chinese]

Abstract

Objective: To compare design-based and model-based methods in descriptive analysis of complex sample.

Methods: A total of 1 000 samples were selected and a multistage random sampling design was used in the analysis of the 2010 China chronic disease and risk factors surveillance. For each simulated sample, cases with probability proportional age were randomly deleted so that sample age structure was deviated systematically from that of the target population. Mean systolic blood pressure (SBP) and prevalence of raised blood pressure, as well as their 95% confidence intervals (95%CI) were determined using design-based and model-based methods (routine method and multi-level model). For estimators generated from those 3 methods, mean squared error(MSE) was computed to evaluate their validity. To compare performance of statistical inference of these methods, the probability of 95%CI covering the true parameter(mean SBP and raised blood pressure prevalence of the population) was used.

Results: MSE of mean estimator for routine method, design-based analysis and multilevel model was 6.41, 1.38, and 5.86, respectively; and the probability of 95%CI covering the true parameter was 24.7%, 97.5% and 84.3%, respectively. The routine method and multi-level model probably led to an increased probability of type I error in statistical inference. MSE of prevalence estimator was 4.80 for design-based method, which was far lower than those for routine method (20.9) and multilevel model (17.2). Probability of 95%CI covering the true prevalence for routine method was only 29.4%, and 86.4% for multilevel model, both of which were lower than that for design-based method (97.3%).

Conclusion: Compared to routine method and multi-level model, design-based method had the best performance both in point estimation and confidence interval construction. Design-based method should be the first choice when doing statistical description of complex samples with a systematically biased sample structure.

MeSH terms

  • Blood Pressure
  • China
  • Humans
  • Hypertension*
  • Models, Statistical*
  • Prevalence*