A machine learning tool for identifying patients with newly diagnosed diabetes in primary care

Prim Care Diabetes. 2024 Oct;18(5):501-505. doi: 10.1016/j.pcd.2024.06.010. Epub 2024 Jun 28.

Abstract

Background and aim: It is crucial to identify a diabetes diagnosis early. Create a predictive model utilizing machine learning (ML) to identify new cases of diabetes in primary health care (PHC).

Methods: A case-control study utilizing data on PHC visits for sex-, age, and PHC-matched controls. Stochastic gradient boosting was used to construct a model for predicting cases of diabetes based on diagnostic codes from PHC consultations during the year before index (diagnosis) date and number of consultations. Variable importance was estimated using the normalized relative influence (NRI) score. Risks of having diabetes were calculated using odds ratios of marginal effects (ORME). Four groups by age and sex were studied, age-groups 35-64 years and ≥ 65 years in men and women, respectively.

Results: The most important predictive factors were hypertension with NRI 21.4-29.7 %, and obesity 4.8-15.2 %. The NRI for other top ten diagnoses and administrative codes generally ranged 1.0-4.2 %.

Conclusions: Our data confirm the known risk patterns for predicting a new diagnosis of diabetes, and the need to test blood glucose frequently. To assess the full potential of ML for risk prediction purposes in clinical practice, future studies could include clinical data on life-style patterns, laboratory tests and prescribed medication.

Keywords: Artificial intelligence; Diabetes; Gradient boosting; Normalized relative influence; Prediction; Primary care.

MeSH terms

  • Adult
  • Aged
  • Biomarkers / blood
  • Case-Control Studies
  • Decision Support Techniques
  • Diabetes Mellitus* / diagnosis
  • Diabetes Mellitus* / epidemiology
  • Diagnosis, Computer-Assisted
  • Early Diagnosis
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Predictive Value of Tests*
  • Primary Health Care*
  • Risk Assessment
  • Risk Factors

Substances

  • Biomarkers