A machine learning tool for identifying newly diagnosed heart failure in individuals with known diabetes in primary care

ESC Heart Fail. 2024 Oct 20. doi: 10.1002/ehf2.15115. Online ahead of print.

Abstract

Aims: We aimed to create a predictive model utilizing machine learning (ML) to identify new cases of congestive heart failure (CHF) in individuals with diabetes in primary health care (PHC) through the analysis of diagnostic data.

Methods: We used a sex- and age-matched case-control design. Cases of new CHF were identified across all outpatient care settings 2015-2022 (n = 9098). We included individuals 30 years and above, by sex and age groups of 30-65 years and >65 years. The controls (five per case) were sampled from the individuals in 2015-2022 without CHF at any time between 2010 and 2022, in total 45 490. From the stochastic gradient boosting (SGB) technique model, we obtained a rank of the 10 most important factors related to newly diagnosed CHF in individuals with diabetes, with the normalized relative influence (NRI) score and a corresponding odds ratio of marginal effects (ORME). Area under curve (AUC) was calculated.

Results: For women 30-65 years and >65 years, we identified 488 and 3240 new cases of CHF, respectively, and men 30-65 years and >65 years 1196 and 4174 new cases. Among the 10 most important factors in the four groups (divided by sex and lower and higher age) for newly diagnosed CHF, we found the number of visits 12 months before diagnosis (NRI 44.3%-55.9%), coronary artery disease (NRI 2.9%-7.8%), atrial fibrillation and flutter (NRI 6.6%-12.2%) and 'abnormalities of breathing' (ICD-10 code R06) (NRI 2.6%-4.4%) were predictive in all groups. For younger women, a diagnosis of COPD (NRI 2.7%) contributed to the predictive effect, while for older women, oedema (NRI 3.1%) and number of years with diabetes (NRI 3.5%) contributed to the predictive effect. For men in both age groups, chronic renal disease had predictive effect (NRI 3.9%-5.1%) The model prediction of CHF among patients with diabetes was high, AUC around 0.85 for the four groups, and with sensitivity over 0.783 and specificity over 0.708 for all four groups.

Conclusions: An SGB model using routinely collected data about diagnoses and number of visits in primary care, can accurately predict risk for diagnosis of heart failure in individuals with diabetes. Age and sex difference in predictive factors warrant further examination.

Keywords: Cardiovascular diseases; Congestive heart failure; Diabetes mellitus; Machine learning; Primary care.

Grants and funding