Development of a machine learning tool to predict the risk of incident chronic kidney disease using health examination data

Front Public Health. 2024 Nov 1:12:1495054. doi: 10.3389/fpubh.2024.1495054. eCollection 2024.

Abstract

Background: Chronic kidney disease (CKD) is characterized by a decreased glomerular filtration rate or renal injury (especially proteinuria) for at least 3 months. The early detection and treatment of CKD, a major global public health concern, before the onset of symptoms is important. This study aimed to develop machine learning models to predict the risk of developing CKD within 1 and 5 years using health examination data.

Methods: Data were collected from patients who underwent annual health examinations between 2017 and 2022. Among the 30,273 participants included in the study, 1,372 had CKD. Demographic characteristics, body mass index, blood pressure, blood and urine test results, and questionnaire responses were used to predict the risk of CKD development at 1 and 5 years. This study examined three outcomes: incident estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2, the development of proteinuria, and incident eGFR <60 mL/min/1.73 m2 or the development of proteinuria. Logistic regression (LR), conditional logistic regression, neural network, and recurrent neural network were used to develop the prediction models.

Results: All models had predictive values, sensitivities, and specificities >0.8 for predicting the onset of CKD in 1 year when the outcome was eGFR <60 mL/min/1.73 m2. The area under the receiver operating characteristic curve (AUROC) was >0.9. With LR and a neural network, the specificities were 0.749 and 0.739 and AUROCs were 0.889 and 0.890, respectively, for predicting onset within 5 years. The AUROCs of most models were approximately 0.65 when the outcome was eGFR <60 mL/min/1.73 m2 or proteinuria. The predictive performance of all models exhibited a significant decrease when eGFR was not included as an explanatory variable (AUROCs: 0.498-0.732).

Conclusion: Machine learning models can predict the risk of CKD, and eGFR plays a crucial role in predicting the onset of CKD. However, it is difficult to predict the onset of proteinuria based solely on health examination data. Further studies must be conducted to predict the decline in eGFR and increase in urine protein levels.

Keywords: chronic kidney disease; estimated glomerular filtration rate; health examination; proteinuria; recurrent neural network.

MeSH terms

  • Adult
  • Aged
  • Female
  • Glomerular Filtration Rate*
  • Humans
  • Incidence
  • Logistic Models
  • Machine Learning*
  • Male
  • Middle Aged
  • Proteinuria / diagnosis
  • Proteinuria / epidemiology
  • Renal Insufficiency, Chronic* / epidemiology
  • Risk Assessment / methods
  • Risk Factors

Grants and funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.