A machine learning classifier-based approach for diabetes mellitus risk prediction

Biomed Phys Eng Express. 2024 Oct 10. doi: 10.1088/2057-1976/ad857b. Online ahead of print.

Abstract

Currently, Diabetes Mellitus (DM) can be life-threatening due to the dietary habits and lifestyle choices of individuals. Diabetes is characterised by elevated levels of glucose in the blood and an excess of protein in the blood. Poor eating habits and lifestyles are largely responsible for the rise in overweight, obesity, and various related conditions. This study investigated many diabetes-related risk forecasting techniques and algorithms. The eight machine learning (ML) algorithms used the diabetes dataset to test various prediction techniques, including a Support Vector Classifier, gradient-boosting, multilayer perceptron, random forest, K-nearest neighbors, logistic regression, extreme gradient boosting, and decision tree. To enhance the diabetic prediction ability of the model, we suggested using Feature Engineering (FE) and feature scaling. For our investigation, we utilized the Mendeley dataset on diabetes to assess the capacity of the model to predict diabetes. We developed a model by using Python programming and eight classification techniques. The Random Forest with 99.21%, Gradient Boosting with 99.61%, Extreme Gradient Boosting, and Decision Tree achieved the highest F1 score (99.81%), accuracy rate (99.80%), precision (99.81%), and recall (99.81%) of all classification approaches.

Keywords: Decision Tree; Diabetes Mellitus; Extreme Gradient Boosting; Feature Engineering; Feature Scaling; Machine Learning; Type 2.