Clustering Medical Conditions in Patient Records Using Unsupervised Learning Techniques: A Comparative Study

eng Oriental Scientific Publishing Company Biomedical and Pharmacology Journal 0974-6242 2025-09-30 18 3 2050 2060 10.13005/bpj/3236 67371 article Clustering Medical Conditions in Patient Records Using Unsupervised Learning Techniques: A Comparative Study Bhupesh Rawat 1 Himanshu Pant 1 Ankur Bist 2 Department of School of Computing, Graphic Era Hill University, Bhimtal, India. Department of Computer Science and Engineering (CSE), Graphic Era Hill University, Bhimtal, India. The expansion of electronic health records (EHRs) presents unparalleled opportunity to identify clinically significant patient trends via unsupervised learning. This study assesses three clustering methodologies—K-Means, DBSCAN, and Hierarchical Clustering—applied to EHR data with PCA for dimensionality reduction, evaluating performance through the Silhouette Score (0.183 for K-Means), Davies-Bouldin Index (1.594), and Calinski-Harabasz Index (245.7). K-Means identified four distinct clusters, including a high-risk grouping including 25% of patients, characterized by increased tumor size (1262 mm) and mitotic activity (0.20/HPF), with SHAP analysis indicating tumor morphology as the principal factor influencing clustering. Although DBSCAN was ineffective in identifying density-based clusters and Hierarchical Clustering exhibited inadequate separation (Silhouette: 0.130), K-Means demonstrated superior efficacy, enabling data-driven patient stratification for personalized treatment strategies and optimized resource allocation. These findings highlight the promise of unsupervised learning in revolutionizing healthcare analytics; however, subsequent research should incorporate temporal data and clinical ontologies to improve interpretability. https://biomedpharmajournal.org/vol18no3/clustering-medical-conditions-in-patient-records-using-unsupervised-learning-techniques-a-comparative-study/ Clustering DBSCAN EHR K-Means Medical Records Patient Profiling PCA Unsupervised Learning