Siddalingappa, Rashmi ORCID: https://orcid.org/0000-0001-9786-8436, Hussain, Showket, S, Deepa, Dheerendra, Pradeep, Gornale, Shivanand, L, Muralidhara B. and Kothandan, Gugan
(2026)
Precision Medicine Gene Network Analyser: part I—cancer driver gene identification through network topology and ensemble machine learning.
Genomics & Informatics, 24 (12).
Preview |
Text
Siddalingappa_et_al-2026-Genomics_&_Informatics.pdf - Published Version Available under License Creative Commons Attribution. | Preview |
Abstract
Purpose
Precision oncology depends on identifying cancer driver genes and linking them to targeted therapies. Current methods using curated gene sets or generic classifiers often miss biologically relevant patterns in complex gene interaction networks.
Methods
We developed the Precision Medicine Gene Network Analyser, integrating network topology analysis with machine learning for cancer gene identification. The dataset included 699 cancer driver genes (COSMIC Cancer Gene Census) and 15,050 background genes, mapped to high-confidence protein–protein interaction networks from STRING (456,300 edges, 15,749 nodes). Network features such as degree, betweenness, PageRank, k-core, and clustering coefficients were extracted. Imbalance Aware Network Integrator (IANI) was proposed to address class imbalance, where balanced resampling and ensemble models (logistic regression, random forest, gradient boosting) were combined with deep neural networks using focal loss, optimising thresholds for maximum F1-score. Hub genes were defined using a statistical cutoff of mean outdegree + 2 × SD (standard deviation).
Results
On a test set of 3150 samples (140 cancer, 3010 non-cancer genes), the optimised ensemble improved ROC-AUC from 0.84 to 0.96, precision from 0.78 to 0.90, and recall from 0.42 to 0.81 (F1 = 0.85) at a threshold of 0.466. Hub analysis identified 689 hubs with fourfold enrichment of cancer genes (16.1% vs. 4.4%, p < 10 − 20), showing higher betweenness centrality (p < 0.001). Key features such as degree (0.32), betweenness (0.24), and PageRank (0.19) contributed 75% of the model’s performance. Top hubs (TP53: 758, EGFR: 512, AKT1: 415 connections) showed 60–67% cancer gene enrichment, with pathway clustering in p53 signalling (75%) and cell cycle regulation (67.7%).
Conclusion
Integrating protein interaction topology with imbalance-aware machine learning achieved 96% discrimination accuracy. This work forms a base for the upcoming phases of drug-gene mapping and patient-specific therapy prediction within the Precision Medicine Gene Network Analyser.
| Item Type: | Article |
|---|---|
| Status: | Published |
| DOI: | 10.1186/s44342-026-00074-7 |
| Subjects: | Q Science > Q Science (General) Q Science > Q Science (General) > Q325 Machine learning |
| School/Department: | London Campus |
| URI: | https://ray.yorksj.ac.uk/id/eprint/15048 |
University Staff: Request a correction | RaY Editors: Update this record
Altmetric
Altmetric