Risk adjustment for regional healthcare funding allocations with ensemble methods: an empirical study and interpretation

Eur J Health Econ. 2024 Sep;25(7):1117-1131. doi: 10.1007/s10198-023-01656-w. Epub 2024 Jan 3.

Abstract

We experiment with recent ensemble machine learning methods in estimating healthcare costs, utilizing Finnish data containing rich individual-level information on healthcare costs, socioeconomic status and diagnostic data from multiple registries. Our data are a random 10% sample (553,675 observations) from the Finnish population in 2017. Using annual healthcare cost in 2017 as a response variable, we compare the performance of Random forest, Gradient Boosting Machine (GBM) and eXtreme Gradient Boosting (XGBoost) to linear regression. As machine learning methods are often seen as unsuitable in risk adjustment applications because of their relative opaqueness, we also introduce visualizations from the machine learning literature to help interpret the contribution of individual variables to the prediction. Our results show that ensemble machine learning methods can improve predictive performance, with all of them significantly outperforming linear regression, and that a certain level of interpretation can be provided for them. We also find individual-level socioeconomic variables to improve prediction accuracy and that their effect is larger for machine learning methods. However, we find that the predictions used for funding allocations are sensitive to model selection, highlighting the need for comprehensive robustness testing when estimating risk adjustment models used in applications.

Keywords: Healthcare costs; Interpretation; Machine learning; Predictive modeling; Risk adjustment; Socioeconomic information.

MeSH terms

  • Empirical Research
  • Female
  • Finland
  • Health Care Costs / statistics & numerical data
  • Humans
  • Linear Models
  • Machine Learning*
  • Male
  • Risk Adjustment* / methods
  • Socioeconomic Factors