Loading...

Early-Stage Diabetes Risk Prediction Utilizing Machine Learning with Explainable AI from Polynomial and Binning Feature Generation

  • Home
  • Publications
  • Early-Stage Diabetes Risk Prediction Utilizing Machine Learning with Explainable AI from Polynomial and Binning Feature Generation
Early-Stage Diabetes Risk Prediction Utilizing Machine Learning with Explainable AI from Polynomial and Binning Feature Generation

Early-Stage Diabetes Risk Prediction Utilizing Machine Learning with Explainable AI from Polynomial and Binning Feature Generation

Published: March 24, 2026 View External Link

Overview

IEEE Xplore 21 January 2025 Publisher: IEEE

Detailed Description

Abstract


Diabetes is a chronic disease that affects a significant portion of the global population. It occurs when the body cannot produce enough insulin or effectively use the insulin it produces, leading to elevated blood glucose levels. Diabetes is a major contributor to various severe health conditions, including heart disease, stroke, kidney failure, and nerve damage. Early detection of diabetes is crucial in mitigating these associated health risks and improving patient outcomes. In response to the increasing prevalence of diabetes, we have developed an automated system for Early-Stage Diabetes Risk Prediction (ESDRP). This study utilizes a dataset consisting of 16 features from 520 instances. We applied multiple Machine Learning (ML) models, including XGBoost (XGB), Bootstrap Aggregating (BAG), Adaptive Boosting (AdaBoost), Light Gradient Boosted Machine (LGBM), and Gradient Boosting Decision Trees (GBDT), both with and without feature generation techniques. Specifically, we explored polynomial and binning feature generation methods. Our findings indicate that the polynomial feature generation technique combined with XGB yielded the highest performance, achieving an accuracy of 99.22%, precision of 100%, recall of 98.15%, specificity of 99.06%, and F1-score of 100%. Additionally, all the ML models were evaluated using confusion matrices (CM) and ROC curves, with the average performance across 10-fold cross-validation demonstrating robust predictive capabilities. Furthermore, to establish trust in our model's predictions, we incorporated two explainable AI (XAI) methods: LIME and SHAP. These techniques helped us understand feature importance, the decision-making process of the models, and enhanced the reliability of our results. Our automated system aims to assist individuals of all ages and healthcare systems in identifying ESDRP, thereby supporting informed decision-making and improving global health outcomes.