Olawade, David ORCID: https://orcid.org/0000-0003-0188-9836, Osborne, Augustus, Soladoye, Afeez A., Oluwadare, Olaitan E., Awogbindin, Emmanuel O. and Wada, Ojima Z.
(2026)
Smart insurance analytics: A novel ensemble feature selection approach to unlock health insurance coverage predictions in Sierra Leone.
International Journal of Medical Informatics, 211.
p. 106313.
Preview |
Text
1-s2.0-S1386505626000535-main.pdf - Published Version Available under License Creative Commons Attribution. | Preview |
Abstract
Background
Predicting health insurance uptake remains a critical challenge for policymakers and insurance providers seeking to optimise coverage strategies and resource allocation. In Sierra Leone, health insurance uptake remains extremely low, and understanding determinants is vital for universal health coverage goals.
Objective
To develop and evaluate an innovative ensemble feature selection methodology for health insurance uptake prediction, establishing new performance benchmarks through systematic comparison of multiple machine learning algorithms using comprehensive validation strategies.
Methods
This study employed supervised machine learning to predict health insurance uptake among 15,574 women using data from the 2019 Sierra Leone Demographic and Health Survey (SLDHS). We implemented an ensemble feature selection approach that requires consensus across Adaptive Ant Colony Optimisation, Recursive Feature Elimination, and Backwards Elimination techniques. Seven algorithms were systematically compared: Logistic Regression, Support Vector Machines, K-Nearest Neighbors, Random Forest, Gradient Boosting, XGBoost, and LightGBM. SMOTE addressed class imbalance, whilst validation employed nested 5-fold cross-validation, 10-fold cross-validation, and hold-out testing to prevent information leakage.
Results
Random Forest achieved exceptional performance with 0.9973 accuracy, 0.9973 precision, 0.9973 recall, 0.9973 F1-score, and perfect 1.0000 ROC AUC on hold-out testing. XGBoost delivered comparable results with 0.9914 across all metrics and 0.9998 ROC AUC. Backward Feature Elimination consistently yielded superior results across ensemble methods. However, the near-perfect performance warrants cautious interpretation and requires external validation to confirm generalizability.
Conclusions
This research establishes new performance benchmarks for health insurance prediction, significantly exceeding existing literature, which has direct implications for health insurance policy and practice in Sierra Leone. The innovative ensemble feature selection methodology provides a robust framework for enhancing prediction accuracy across healthcare applications, offering immediate practical value for stakeholders. Future work should prioritize external validation, explainability analysis, and temporal stability assessment to ensure practical deployment readiness.
| Item Type: | Article |
|---|---|
| Status: | Published |
| DOI: | 10.1016/j.ijmedinf.2026.106313 |
| School/Department: | London Campus |
| URI: | https://ray.yorksj.ac.uk/id/eprint/13965 |
University Staff: Request a correction | RaY Editors: Update this record
Altmetric
Altmetric