Comparative analysis of machine learning models for coronary artery disease prediction with optimized feature selection

Olawade, David; Soladoye, Afeez A.; Omodunbi, Bolaji A.; Aderinto, Nicholas; Adeyanju, Ibrahim A.

Comparative analysis of machine learning models for coronary artery disease prediction with optimized feature selection

Olawade, David ORCID: https://orcid.org/0000-0003-0188-9836, Soladoye, Afeez A., Omodunbi, Bolaji A., Aderinto, Nicholas and Adeyanju, Ibrahim A. (2025) Comparative analysis of machine learning models for coronary artery disease prediction with optimized feature selection. International Journal of Cardiology, 436. p. 133443.

[thumbnail of 1-s2.0-S0167527325004863-main.pdf]

Preview

Text
1-s2.0-S0167527325004863-main.pdf - Published Version
Available under License Creative Commons Attribution.
| Preview

Official URL: https://doi.org/10.1016/j.ijcard.2025.133443

Abstract

Background
Coronary artery disease (CAD) is a major global cause of death, necessitating early, accurate prediction for better management. Traditional diagnostics are often invasive, costly, and less accessible. Machine learning (ML) offers a non-invasive alternative, but high-dimensional data and redundancy can hinder performance. This study integrates Bald Eagle Search Optimization (BESO) for feature selection to improve CAD classification using multiple ML models.
Methods
Two publicly available datasets, Framingham (4200 instances, 15 features) and Z-Alizadeh Sani (304 instances, 55 features), were used. The former predicts 10-year CAD risk, while the latter classifies current CAD status. Data preprocessing included missing value imputation, normalization, categorical encoding, and class balancing using SMOTE. We employed a 70–30 holdout validation strategy with empirical hyperparameter optimization, providing more reliable final model development than cross-validation. BESO was applied to optimize feature selection, significantly outperforming traditional methods like RFE and LASSO. Six ML models—KNN, logistic regression, SVM with linear, polynomial, and RBF kernels, and random forest—were trained and evaluated.
Results
Random Forest achieved the highest performance across both datasets. In the Framingham dataset, RF recorded 90 % accuracy, significantly outperforming traditional clinical risk scores (71–73 % accuracy). Linear models performed better on the Z-Alizadeh Sani dataset (90 % accuracy) than Framingham (66 %), indicating dataset characteristics strongly influence model efficacy.
Conclusion
BESO significantly enhances feature selection, with RF emerging as the optimal classifier (92 % accuracy) and substantially outperforming established clinical risk scores. This study highlights the potential of AI-driven CAD diagnosis, supporting early detection and improved patient outcomes. Future work should focus on prospective validation and clinical implementation.