Highlights:
-
This article systematically reviews the application of a deep learning method—Generative Adversarial Networks (GANs)—in the auxiliary diagnosis of Alzheimer’s Disease (AD) and the processing of neuroimaging data (image denoising, image segmentation, data augmentation, and modality conversion);
-
This article finds that compared to other methods, GANs exhibit higher classification accuracy in AD auxiliary diagnosis tasks and perform better in AD-related data processing tasks, analyzing from a computer science perspective why GANs outperform other methods in relevant tasks;
-
This article discusses the clinical value of GAN applications in AD-related tasks, points out current research limitations, and provides suggestions and prospects for future related research.
Introduction:
Alzheimer’s Disease (AD) is a neurodegenerative disease that primarily affects the elderly, characterized by declines in memory, cognitive function, and behavioral function, severely impacting patients’ daily lives. Mild Cognitive Impairment (MCI) lies between normal aging and AD. Approximately 10%-15% of MCI patients progress to AD each year, while some MCI patients remain stable or even recover to a healthy state. As there is currently no effective treatment for AD, clinical treatment has shifted to identifying and intervening in patients at early stages of AD, helping stabilize their condition and slow disease progression. Therefore, accurately classifying the disease status of AD—determining whether a patient is normal, stable MCI (sMCI), or progressive MCI (pMCI)—will help identify high-risk individuals for targeted treatment measures, delaying disease progression and reducing the incidence of AD.
In recent years, neuroimaging research has provided clinical evidence for identifying imaging biomarkers for diagnosing and treating mental disorders, while artificial intelligence technology has been widely applied in the diagnosis and image processing of AD. Many studies have focused on feature extraction and classification of imaging data from normal individuals, MCI, and AD patients using deep learning methods, achieving certain results. However, existing deep learning methods represented by Convolutional Neural Networks (CNNs) require a large amount of high-quality image data input to meet the demands of brain imaging research for AD image modality conversion, denoising, and segmentation. Due to technical and cost constraints, available AD-related brain imaging data is still relatively scarce, thus necessitating a deep learning method capable of high-quality image processing based on limited data.
Generative Adversarial Networks (GANs), proposed by Goodfellow et al. in 2014, are deep learning models primarily used for image processing. A standard GAN model consists of a generator and a discriminator, where the generator produces images based on input data, images, or random noise, while the discriminator is responsible for determining whether an image is “real”, meaning whether it is a real image or one generated by the generator (Figure 1). GANs train neural networks through a minimax game between the generator and discriminator, during which the images generated by the generator increasingly resemble real images to “fool” the discriminator, while the discriminator’s ability to judge the authenticity of images also improves continuously. Currently, GANs are applied in the field of brain imaging mainly in the following aspects: 1) denoising low-dose Positron Emission Tomography (PET) images to obtain high-quality images; 2) accurately segmenting brain images to facilitate further processing; 3) augmenting image data and converting modalities. Additionally, GANs can provide high-quality image data for the feature extraction step in classification frameworks, improving the effectiveness of classification algorithms, which is beneficial for the clinical classification of AD.

Figure 1 Basic structure of GAN and its applications in neuroimaging.
Currently, some reviews have reported the role of GANs in the medical field, but applications of GANs in the specific disease of AD have not received much attention. This article aims to systematically review the research on the application of GANs in the auxiliary diagnosis and processing of neuroimaging in AD, and to assess the methods used in the current research (including data sources and modalities, GAN models, comparison methods, and quantitative indicators) from the perspective of clinical practitioners, analyze the limitations of current research, and propose prospects for future research.
Research Methods:
This systematic review was conducted following the PRISMA guidelines. The authors searched the PubMed, Cochrane Library, EMBASE, Web of Science, and IEEE Xplore databases using keywords such as Alzheimer, AD, dementia, mild cognitive, F-18-FDG, FDGPET, amyloid, Tau-PET, generative model, and generative adversarial network, to find English literature on the application of GANs in AD before August 2020, ultimately including 15 articles.
Research Findings:
1. Application of GANs in AD-related neuroimaging data processing
● Image Denoising: In three studies included, the aim was to find effective machine learning methods to convert low-dose PET images into high-dose PET images to achieve higher Peak Signal-to-Noise Ratio (PSNR) and improve clinical diagnostic accuracy. Comparison of results from different methods showed that GAN’s denoising effect significantly outperformed other denoising methods (Figure 2). The included studies also indicated that denoised images could assist subsequent deep learning classifiers in achieving more accurate disease classification results. Moreover, an increasing number of healthy controls and young MCI patients have been included in AD-related clinical trials, and frequent high-dose PET scans may increase their radiation dose, thus raising potential radiation exposure risks. Therefore, through denoising methods, researchers can not only obtain high-quality images and accurate diagnostic information but also reduce potential health damage to patients and healthy controls in the studies.

Figure 2 Compared to other methods, GAN achieves superior PSNR in AD brain imaging denoising.
Note: PSNR is directly proportional to the quality of image denoising; m-CCA, multilevel CCA; AcCNN, autocontext CNN; CCA, canonical correlation analysis; MVPL, models without perceptual loss; MWV, models with perceptual loss computed from VGG16.
● Image Segmentation: In terms of data segmentation, precise segmentation of brain images is beneficial for locating AD state features. Shi et al. (2019) and Oh et al. (2020) compared the segmentation accuracy of different methods for hippocampal subregions and brain gray and white matter, finding that using the Dice Similarity Coefficient (DSC) as the outcome indicator, GAN achieved higher segmentation accuracy (Figure 3).

Figure 3 Comparison of DSC obtained from various methods applied to AD-related image segmentation, showing that GAN’s image segmentation effect is superior to other methods.
Note: DSC is directly proportional to image segmentation quality; UGN, UG-net; P2P, Pix2pix unet; HD, h-dense unet method; UN, U-net.
● Image Augmentation: A major challenge in using deep learning is the lack of sufficient data to train classification frameworks. Given that PET is relatively expensive and difficult to obtain, and that PET imaging data is particularly important for AD classification, the issue of data scarcity is especially pronounced in AD research. In the included studies, Kang et al. (2020) used Conditional GAN (cGAN) to synthesize 18F florbetaben PET images that closely resemble real data using random noise, helping to address the data shortage issue in the development of AD-related deep learning frameworks.
● Modality Conversion: Currently, an increasing number of deep learning models use multimodal data to assist in AD diagnosis. Among the commonly used neuroimaging data for AD, MRI contains more structural information, while PET provides more metabolic and quantitative analysis value. The combined application of MRI and PET can provide clinicians with more comprehensive diagnostic information. Clinically, to supplement the insufficiency of certain modality data, modality conversion is often performed. Kang et al. (2018) applied GAN to generate individualized PET templates, accurately normalizing amyloid PET images for objective assessment and statistical analysis without using corresponding 3D-MR images. Compared to methods based on average templates, GAN-based and Convolutional Auto-Encoder (CAE)-based methods exhibited higher mutual information and lower mean squared error. Choi et al. (2018) used GAN to convert between different modal medical images, generating realistic structural MR images from florbetapir PET images, achieving good results in quantifying cortical amyloid burden.
2. Application of GANs in AD Auxiliary Diagnosis
● Currently, there are no drugs that can effectively treat the symptoms of patients who have progressed to AD or prevent further development of AD, with a clinical trial failure rate of 99.6% for AD drugs. Therefore, the focus of clinical research has shifted to early diagnosis and intervention of AD. In this systematic review, we found that GANs can effectively distinguish AD patients from normal cognitive decline controls, and GAN-based classification frameworks identify high-risk individuals (pMCI) among MCI patients (sMCI and pMCI) with higher accuracy than other algorithms (Figure 4). This provides opportunities to delay disease progression and reduce the incidence of AD patients.

Figure 4 GAN’s accuracy (ACC) in AD auxiliary diagnosis is superior to other methods.
Note: LM3IL-C, GAN that uses only complete MRI and PET data; RBM, real image-based method; ICP, indirect conversion prediction; DCP, direct conversion prediction (a CNN classifier); TA, traditional augmentation.
We also found that in most studies on AD classification, a two-stage deep learning framework is usually established (Figure 5). The first stage involves synthesizing medical images or extracting relevant features, and the second step involves building a classifier for classification. Researchers use GANs to synthesize images and extract features while employing other algorithms (such as CNN) to construct classifiers. This structure fully utilizes the advantages of GANs in image processing. To achieve higher accuracy, accurate feature extraction is often more important than the classification algorithm. GAN’s processing of AD brain imaging data can better extract relevant features to complete the second stage of classification.

Figure 5 General framework of GAN applications in AD auxiliary diagnosis research.
3. Reasons Why GANs Outperform Other Methods in AD Neuroimaging Tasks
In this systematic review, we found that GANs exhibit superior performance compared to other methods in feature extraction, modality conversion, image denoising, and segmentation tasks related to AD neuroimaging data processing, which may be attributed to their better adaptability to the processing of brain imaging data such as MRI and PET.
Firstly, brain imaging data is often complex and high-dimensional, and GAN’s adversarial structure has advantages in data processing compared to other deep learning methods. Traditional CNNs often require a massive amount of computation to fit high-dimensional data models, resulting in relatively poor quality of generated images and being often affected by artifacts like aliasing. GANs do not require a predetermined distribution for the data; theoretically, any differentiable function can be used to construct the generator and discriminator, and they can be combined with existing deep learning networks (such as CNNs), allowing GANs to directly sample train on samples without needing to customize complex loss functions, thus approximating the true probability distribution of high-dimensional data at any precision and generating high-quality images.
Secondly, due to the need for experienced imaging physicians to analyze the data, the use of specialized equipment for collection, and the long-term follow-up of patients, the amount of AD-related brain imaging data for training is relatively small and often subject to inter-group data imbalance. GANs can learn the underlying data distribution from limited available images and generate high-quality images, making them advantageous in training with fewer data and in handling imbalanced data, while traditional deep learning and machine learning methods require a lot of prior knowledge (such as needing large amounts of data) and are more likely to encounter overfitting issues when processing small amounts of data.
These technical advantages make GANs more suitable for AD-related brain imaging data processing and can explain the findings of this systematic review.
4. Research Methods Used in Included Studies
● Data Sources: Data quality has always been a focus of deep learning research. Although GANs can process high-quality image data, data from different sources can also affect the effectiveness of AD classification and other clinical applications.
We reviewed the training and validation datasets used in the included studies and found that for most studies, the training data came from the relatively complete public database, the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Figure 6D). Meanwhile, some researchers collaborated with medical institutions to use clinically collected data for training deep learning models. However, using clinical cases collected solely from one institution usually results in a small sample size, which can lead to poor generation effects and overfitting issues. Therefore, we recommend using large, high-quality public databases (such as ADNI) to expand sample sizes, thus improving the generation effects and generalization capabilities of GAN models.
For validation data, we noted that most studies only used internal validation methods to assess the performance of GAN models, meaning that most of the data used for training and validating algorithms came from the same dataset, and relying solely on internal validation may lead to insufficient generalization performance of deep learning models in real-world, high-capacity clinical settings. Therefore, it is very important to use different datasets as validation data. Researchers may consider using data from large databases to train deep learning frameworks and using clinically collected data from their institutions for validation.
● Data Modalities: We examined the input/output data modalities used in the included studies (Figure 6 A&B). Most studies used only single modality data (PET or MRI) for AD classification or image processing, with few using multimodal data. Two included studies employed a two-stage deep learning architecture. In the first stage, GAN was used to generate missing PET images based on MRI images, and then the generated PET images and original MRI images were input into a CNN classifier for classification. This method’s training effect was superior to using MRI data alone, demonstrating the good efficiency of multimodal classification methods.
Currently, most studies on PET-MRI modality conversion use single-modal structural MRI images (such as T1-weighted MRI), which may not accurately synthesize PET images. Wang et al. (2019) innovatively combined T1-weighted MRI with Diffusion Tensor Imaging (DTI) to synthesize PET images. This multimodal MRI-based modality conversion method achieved good results, providing insights for future research. However, this study had a small sample size and lacked follow-up research after expanding the sample size.
Additionally, we analyzed the training sample sizes used in the included studies for different modal data (Figure 6C). In the included studies, those using PET images typically had smaller sample sizes, likely due to the high cost of obtaining PET and the relative scarcity of paired MRI-PET data in public datasets like ADNI.
Furthermore, nearly half of the included studies used partial 2D slices in MRI or PET images for training instead of whole 3D images, which may lead to loss of spatial information and discontinuous estimates. However, studies have found that using whole 3D images may increase the proportion of GAN models, thereby affecting generation efficiency. Therefore, how to effectively train GANs using three-dimensional image data still requires further research.

Figure 6 (A) Data dimensions, (B) data modalities, (C) data amounts, and (D) data sources used in the included studies.
● GAN Network Structures: This study reviewed the structural features of GANs used for different image processing tasks in the included studies, finding that most studies on image-to-image tasks (image denoising, image segmentation, and modality conversion) employed cGAN models. This supervised model, proposed by Mirza et al. (2014), constrains the generator and discriminator using conditional variable C to generate specified target images. In medical image image-to-image tasks, the input image serves as the conditional variable C, allowing cGAN to perform corresponding processing based on the image and obtain the desired output image. This GAN model has been shown to achieve good performance in medical image denoising, segmentation, and modality conversion, which aligns with the results of this systematic review.
In image feature extraction, most studies used the WGAN model proposed by Arjovsky et al. (2017), which minimizes the distance between the real distribution and the generated distribution, thus better extracting meaningful features from images to complete the feature extraction task.
In noise-to-image tasks, most studies chose the DCGAN model proposed by Radford et al. (2015), which combines CNN with GAN to improve training stability and the quality of generated images, widely applied in medical image data augmentation.
Additionally, the GAN models used by Pan et al. (2018) and Kim et al. (2020) also propose new directions for future research. The cycle GAN used by Pan et al. (2018), proposed by Zhu et al. (2017), creatively employs two sets of generators and discriminators to learn the mapping relationship between two modified datasets, completing modality conversion without paired data, thus holding great clinical application potential and research value.
Kim et al. (2020) used BEGAN, proposed by Berthelot et al. (2017), which generates data by estimating distribution errors rather than the differences between generated data and actual data, thereby improving generation stability. However, its application effect in high-resolution images is poor, and its application in medical images is limited, requiring further research.
● Quantitative Assessment: To ensure that the developed algorithms can be applied in clinical practice, quantitative assessments and comparison methods are worth attention. Quantitative assessments can detect factors that reduce generalization performance and evaluate the applicability of training datasets. Most studies adopted common classification and image quality assessment indicators, such as accuracy (ACC), Area Under the ROC Curve (AUC), PSNR, DSC, etc., with only two studies involving clinical imaging physicians in image quality evaluation. However, from the results of the included studies, the quantitative evaluation indicators for studies with the same application purpose are not uniform. This is also one of the reasons why this article only conducts a systematic review without meta-analysis. Future research can propose reference assessment indicators for different purposes, which would facilitate horizontal comparisons between studies.
● Comparison Methods: In terms of method comparison, the comparisons conducted in the included studies can be categorized into the following types: ① comparison with real data (for studies on synthesized images); ② comparison with their own algorithm after removing a specific part; ③ comparison with generators without adversarial training; ④ comparison with mature algorithms that are not GANs; ⑤ comparison with expert manual classifications (only one study). Some studies applied to AD classification did not directly compare the GAN algorithm but focused on the classifier algorithm in the second step. We suggest that future research can choose at least three methods from the above to enhance the reliability of the research. At the same time, from the results of the included studies, the evaluation process of the algorithms lacks the involvement of clinical physicians, limiting the transition of research from experiments to clinical applications. Therefore, we strongly recommend recruiting clinical physicians to evaluate the algorithms in future research. Specifically, from the perspective of quantitative assessment, clinical physicians could score images after processing such as denoising and segmentation and then compare the scores of different algorithms.
5. Limitations of Current Research and Future Prospects
GAN algorithms themselves have some common issues, such as training difficulty. During training, the generator and discriminator often fail to balance well, which can lead to mode collapse and gradient vanishing issues, causing the generator to stop training after learning only part of the data distribution pattern without converging to a global Nash equilibrium. Additionally, GANs require good initialization during training; otherwise, the learned distribution may still be far from the true distribution, leading to cyclical, oscillatory, or diverging behaviors. Furthermore, GAN generators can only learn end-to-end mapping functions that lack explicit expressions, resulting in poor interpretability of GANs and unclear correspondence between latent space and generated images, making it a “black box” for researchers. Some researchers have proposed optimized GAN models to address the above issues (such as cGAN, WGAN, and cycleGAN), but GANs still need further optimization to fully realize their generative performance.
The application of GANs in brain diseases still has some limitations. Currently, GANs are mainly used for processing AD-related medical images, but their application in other mental diseases (such as schizophrenia, autism, attention deficit hyperactivity disorder, etc.) is still lacking. Currently, deep learning methods (such as CNN) have gradually been applied to the imaging data processing of these mental diseases, achieving some promising results, but their ability to handle functional MRI and other high-dimensional neuroimaging data needs improvement. Therefore, GANs, which outperform other traditional deep learning methods in processing high-dimensional data, hold great promise for application in these diseases. Additionally, the application of GANs in AD disease classification can be extended to the field of bioinformatics, such as using GANs to analyze AD molecular data. The ability of GANs to amplify data in image processing can be transferred to bioinformatics research. Furthermore, in this systematic review, we found that researchers rarely pay attention to the clinical information contained in images (such as amyloid status) when conducting AD-related studies. In the future, when utilizing GANs for AD-related tasks, algorithm researchers should closely cooperate with radiologists to ensure consistency in the clinical information provided by images before and after processing.
Conclusion:
This systematic review demonstrates the application value of GANs in the auxiliary diagnosis of AD and related neuroimaging data processing. Compared to other methods, GAN classification is more accurate, denoised image quality is higher, and image segmentation is more precise. In the future, researchers need to consider using better data and GAN architectures, comparing algorithms with the heterogeneity of clinical practice, and recruiting clinical physicians to evaluate the effectiveness of the algorithms.
The corresponding author of this article is Chen Taolin, PhD in Cognitive Neuroscience (Psychology) from Beijing Normal University, postdoctoral fellow in Clinical Medicine from Sichuan University, currently an associate researcher at the West China Hospital of Sichuan University, West China MRI Research Center.Member of the Neuroimaging Professional Committee of the Chinese Cognitive Science Society, member of the EEG-related Technology Professional Committee of the Chinese Psychological Society, and executive committee member of the Electrophysiology and Rehabilitation Group of the Brain Function Detection and Regulation Rehabilitation Professional Committee of the Chinese Rehabilitation Medicine Association.Long-term engaged in research on the emotional and cognitive interrelations and their neural mechanisms in healthy populations and patients with mental disorders.
• Citation of This Article
Changxing Qu, Yinxi Zou, Qingyi Dai, Yingqiao Ma, Jinbo He, Qihong Liu, Weihong Kuang, Zhiyun Jia, Taolin Chen, Qiyong Gong, Advancing diagnostic performance and clinical applicability of deep learning-driven generative adversarial networks for Alzheimer’s disease, Psychoradiology, Volume 1, Issue 4, December 2021, Pages 225–248, https://doi.org/10.1093/psyrad/kkab017
• This Issue’s Article Homepage

Scan to Get
Tips
Publication Purpose:
1. Publish the latest important research results and progress in the field of international psychoradiology;
2. Provide new theoretical foundations, methods, and experience sharing for the diagnosis, treatment, and prevention of mental disorders;
3. Provide an academic exchange platform for domestic and international clinicians, researchers, etc., serving the development of medical career;
4. Fill the gap in the journal of the discipline field, maintaining China’s leading position in this field.
Currently, we have assembled an international editorial board team, hiring a total of 41 renowned experts from home and abroad. The editor-in-chief is Professor Gong Qiyong from West China Hospital of Sichuan University, Professor Keith Maurice Kendrick from the University of Electronic Science and Technology of China, and Academician Lu Lin from Peking University. The editorial board members have multidisciplinary backgrounds, including Academician Chen Lin and Academician Su Guohui from the Chinese Academy of Sciences, Fellow of the Royal Society and Fellow of the Academy of Medical Sciences of the UK, Academician Sir Colin Blakemore, Professor Trevor Robbins from the Academy of Medical Sciences of the UK, and renowned scholars such as Bharat Biswal and Gary Glover from the Academy of Medicine and Biomedical Engineering in the USA.
psyrad@psychoradiology.org
Add the editor’s WeChat to pull you into the group~👇

Journal Homepage