<?xml version="1.0" encoding="UTF-8"?>



<records>

  <record>
    <language>eng</language>
          <publisher>Oriental Scientific Publishing Company</publisher>
        <journalTitle>Biomedical and Pharmacology Journal</journalTitle>
          <issn>0974-6242</issn>
            <publicationDate>2026-05-01</publicationDate>
    
        <volume>19</volume>
        <issue>2</issue>

 
    <startPage></startPage>
    <endPage></endPage>

	    <publisherRecordId>71407</publisherRecordId>
    <documentType>article</documentType>
    <title language="eng">An Ensemble Machine Learning Approach for Diabetes Classification and Prediction with Minimal Error: A Comprehensive Exploratory Data Analysis</title>

    <authors>
	 


      <author>
       <name>Vishal Verma</name>

 
		
	<affiliationId>1</affiliationId>
      </author>
    

	 


      <author>
       <name>Malik Shahzad Ahmed Iqbal</name>


		
	<affiliationId>2</affiliationId>

      </author>
    

	 


      <author>
       <name>Satish Kumar</name>

		
	<affiliationId>3</affiliationId>
      </author>
    

	 


      <author>
       <name>Vandna Rani Verma</name>

		
	<affiliationId>4</affiliationId>
      </author>
    


	 


      <author>
       <name>Alka Agrawal</name>

		
	<affiliationId>1</affiliationId>
      </author>
    


	
    </authors>
    
	    <affiliationsList>
	    
		
		<affiliationName affiliationId="1">Department of Information Technology, Babasaheb Bhimrao Ambedkar University, Lucknow, India</affiliationName>
    

		
		<affiliationName affiliationId="2">Department of Computer Science and Engineering, Acharya University, Karakul, Uzbekistan</affiliationName>
    
		
		<affiliationName affiliationId="3">Department of Computer Application, Integral University, Lucknow, India</affiliationName>
    
		
		<affiliationName affiliationId="4">Department of Computer Science and Engineering, Galgotias College of Engineering and Technology, Greater Noida, India</affiliationName>
    
		
		
	  </affiliationsList>






    <abstract language="eng">Diabetes is one of the chronic metabolic diseases and remains a serious public health problem worldwide. It results in multiple health problems, including high blood pressure, skin disorders, heart disease, kidney diseases, and eye damage. Many individuals with diabetes are undiagnosed for a long time. Early prediction is crucial in receiving timely treatment for the patient. The primary goal of this research is to construct an effective and reliable prediction model for early-stage diabetes, incorporating ensemble learning with novel data balancing techniques. In this study, researchers have utilized a Machine learning (ML) based ensemble classifier, Extra Trees Classifier (ETC), along with twelve data balancing techniques, namely SMOTE-ENN, RandomOverSampler, InstanceHardnessThreshold, SMOTETomek, SMOTE, KMeansSMOTE, BorderlineSMOTE, AllKNN, RandomUnderSampler, NeighbourhoodCleaningRule, NearMiss, and TomekLinks to predict a diabetes patient. This study has utilized a newly extensive dataset, the Diabetes Prediction Dataset (DPD), taken from the Kaggle repository. The dataset consists of 100000 instances. The findings demonstrate that SMOTE-ENN combined with Extra Tree Classifier (SEETC) performs outstandingly. SEETC proposed model achieved 0.997, 0.993, 0.993, 0.997, 0.995, 0.995, 0.995, 0.3, and 0.7, respectively, in Precision<sub>Neg </sub>(P<sub>Neg</sub>), Precision<sub>Pos</sub> (P<sub>Pos</sub>), Recall<sub>Neg</sub> (R<sub>Neg</sub>), Recall<sub>Pos</sub> (R<sub>Pos</sub>), F1_Score<sub>Neg</sub> (F1<sub>Neg</sub>), F1_Score<sub>Pos</sub> (F1<sub>Pos</sub>), Accuracy (Acc), Type 1 Error Rate (T1E Rate) &amp; Type 2 Error Rate (T2E Rate). The proposed modelhas been evaluated against other existing models, including Decision Tree (DT), Random Forest (RF), AdaBoost Classifier (ABC), and XGBoost (XGB). The results indicate that integrating SMOTE-ENN with ETC yields superior performance. This integration will be highly beneficial for diabetes prediction.</abstract>

    <fullTextUrl format="html">https://biomedpharmajournal.org/vol19no2/an-ensemble-machine-learning-approach-for-diabetes-classification-and-prediction-with-minimal-error-a-comprehensive-exploratory-data-analysis/</fullTextUrl>

<keywords language="eng">

      
        <keyword>Ensemble</keyword>
      

      
        <keyword> Extra Tree Classifier</keyword>
      

      
        <keyword> Healthcare</keyword>
      

      
        <keyword> Imbalanced data</keyword>
      

      
        <keyword> Machine learning</keyword>
      

      
        <keyword> SMOTE-ENN.</keyword>
      
</keywords>
  </record>
</records>