A Narrative Review of GAN-Based Synthetic Data Generation in Disease Prediction

Ganesan, Swathi; Somasiri, Nalinda

A Narrative Review of GAN-Based Synthetic Data Generation in Disease Prediction

Ganesan, Swathi ORCID: https://orcid.org/0000-0002-6278-2090 and Somasiri, Nalinda ORCID: https://orcid.org/0000-0001-6311-2251 (2026) A Narrative Review of GAN-Based Synthetic Data Generation in Disease Prediction. Journal of Data Science and Intelligent Systems.

Preview

Text
JDSIS62028542_R1.pdf - Published Version
Available under License Creative Commons Attribution.
| Preview

Official URL: https://doi.org/10.47852/bonviewJDSIS62028542

Abstract

Synthetic data generation has emerged as an important approach in the healthcare field to address data scarcity, disease class imbalance, and privacy restrictions that limit access to patient data. Among generative approaches, generative adversarial networks (GANs) have gained increasing attention, especially because of their ability to generate realistic data across complex data distributions such as medical imaging, electronic health records (EHRs), laboratory data, and phenotype codes. This narrative review focuses on the evolution of major GAN architectures and their application in disease prediction. The original GAN introduced the adversarial paradigm, while Deep Convolutional GAN advanced image generation and became widely used in MRI, CT, and histopathology tasks. Wasserstein GAN variants (WGAN and WGAN-GP) improve training stability and prove to be more suitable for tabular and structured healthcare data such as EHRs. More specialized architectures, including Conditional Tabular GAN and Medical GAN, further extended synthetic data generation to mixed-type datasets and sparse diagnostic records. The review also examines evaluation practices based on data fidelity, downstream utility, and privacy preservation, including differential privacy and resistance to membership inference attacks. Overall, the literature shows that GAN-generated synthetic data can support disease prediction research, but important challenges remain in benchmarking, reproducibility, interpretability, and ethical deployment. Emerging directions include hybrid GAN-diffusion models, federated training strategies, and standardized evaluation frameworks to support clinically reliable and privacy-preserving adoption.

Item Type:	Article
Status:	Published
DOI:	10.7759/s44389-025-00063-x
Subjects:	T Technology > T Technology (General)
School/Department:	London Campus
URI:	https://ray.yorksj.ac.uk/id/eprint/15256

University Staff: Request a correction | RaY Editors: Update this record

Altmetric

View Altmetric information about this item.

CORE (COnnecting REpositories)

Tools

Deposit and Record Details

ID Code:	15256
Depositing User:	Ganesan, Swathi
Deposited On:	18 Jun 2026 10:51
Last Modified:	19 Jun 2026 19:45