Personalized prediction of incident hospitalization for cardiovascular disease in patients with hypertension using machine learning

BMC Med Res Methodol. 2022 Dec 17;22(1):325. doi: 10.1186/s12874-022-01814-3.

Abstract

Background: Prognostic information for patients with hypertension is largely based on population averages. The purpose of this study was to compare the performance of four machine learning approaches for personalized prediction of incident hospitalization for cardiovascular disease among newly diagnosed hypertensive patients.

Methods: Using province-wide linked administrative health data in Alberta, we analyzed a cohort of 259,873 newly-diagnosed hypertensive patients from 2009 to 2015 who collectively had 11,863 incident hospitalizations for heart failure, myocardial infarction, and stroke. Linear multi-task logistic regression, neural multi-task logistic regression, random survival forest and Cox proportional hazard models were used to determine the number of event-free survivors at each time-point and to construct individual event-free survival probability curves. The predictive performance was evaluated by root mean squared error, mean absolute error, concordance index, and the Brier score.

Results: The random survival forest model has the lowest root mean squared error value at 33.94 and lowest mean absolute error value at 28.37. Machine learning methods provide similar discrimination and calibration in the personalized survival prediction of hospitalizations for cardiovascular events in patients with hypertension. Neural multi-task logistic regression model has the highest concordance index at 0.8149 and lowest Brier score at 0.0242 for the personalized survival prediction.

Conclusions: This is the first personalized survival prediction for cardiovascular diseases among hypertensive patients using administrative data. The four models tested in this analysis exhibited a similar discrimination and calibration ability in predicting personalized survival prediction of hypertension patients.

Keywords: Administrative health data; Cardiovascular disease; Machine learning; Personalized prediction, Hypertension patients.

MeSH terms

  • Cardiovascular Diseases* / epidemiology
  • Hospitalization
  • Humans
  • Hypertension* / diagnosis
  • Hypertension* / epidemiology
  • Machine Learning
  • Proportional Hazards Models