Overview of GAN Models and Medical Image Fusion Applications

The “Outcome Overview” series of articles aims to disseminate important results from conferences and journals in the field of image graphics, allowing readers to quickly understand relevant academic dynamics in their native language through short articles. We welcome your attention and submissions~

◆ ◆ ◆ ◆

GAN Review: Models and Medical Image Fusion Applications

Zhou Tao , Li Qi , Lu Huiling , Cheng Qianru , Zhang Xiangxiang

School of Computer Science and Engineering, Northern Minzu University, Key Laboratory of Intelligent Processing of Images and Graphics, National Ethnic Affairs Commission, Northern Minzu University, Ningxia Medical University, School of Science

Information Fusion 2022

Writers: Li Qi, Zhou Tao

*Corresponding Author: Li Qi

Recommended Directors: Zhou Tao, Zhang Yanning

Original Title: GAN review: Models and medical image fusion applications

Original Link: https://doi.org/10.1016/j.inffus.2022.10.017

◆ ◆ ◆ ◆

Abstract

Generative Adversarial Networks (GANs) are a hot research topic in deep generative models and have been widely applied in the field of medical image fusion. This article summarizes GAN models from four aspects: First, it explains the basic principles of GANs from the perspectives of the basic model and training process; Second, it categorizes different GAN models into three directions (probability distribution distance, overall network architecture, neural network structure), summarizing typical models in recent years from eight dimensions: methods based on f-divergence, IPM-based methods, single-generator dual-discriminator GANs, multi-generator single-discriminator GANs, multi-generator multi-discriminator GANs, conditional constraint GANs, convolutional neural network structure GANs, and autoencoder neural network structure GANs; Third, it explores the advantages and applications of GANs in the field of medical image fusion from three aspects; Fourth, it discusses the main challenges faced by GAN models and GAN models in the field of medical image fusion and provides an outlook on future development directions. This article systematically summarizes various GAN models and their advantages and challenges in the field of medical image fusion, which is of great significance for future research on GANs.

1. Introduction

Generative models have achieved great success in applications such as image processing, density estimation, and style transfer, but with the significant increase in the number and dimensions of observable samples, they have gradually been replaced by deep generative models with multiple hidden layers. Deep generative models have been successfully applied in computer vision, natural language processing, image generation, and semi-supervised learning, providing a good paradigm for unsupervised learning. Among them, the Generative Adversarial Network (GAN) proposed by Goodfellow et al. in 2014 is currently a hot research topic in deep generative models. This model can implicitly estimate the density function of data distribution and can use the powerful fitting ability of neural networks to generate samples that conform to the real data distribution through adversarial training. Given the excellent performance of GANs, they have gained widespread attention from researchers in the field of medical image fusion. This article analyzes and discusses GAN models and their advantages and applications in the field of medical image fusion, with the overall structure shown in Figure 1.

Overview of GAN Models and Medical Image Fusion Applications

Fig 1 An overview of the structure of this paper from Section 2 to 5. SGDD GAN expresses Single-Generator and Dual-Discriminators GAN; MGSD GAN expresses Multi-Generators and Single-Discriminator GAN; MGMD GAN expresses Multi-Generators and Multi-Discriminators GAN; CC GAN expresses Conditional Constraint GAN; SGSD GAN expresses Single-Generator and Single-Discriminator GAN; SGDD GAN expresses Single-Generator and Dual-Discriminators GAN.

2. Basic Principles of GANs

Inspired by zero-sum games in game theory, GANs view the generative problem as a game between the generator and the discriminator. The basic model is shown in Figure 2. The basic model is shown in Figure 2.

Fig 2 Basic model. expresses the loss function of Generator; expresses the loss function of Discriminator.

Compared to other generative models, GANs have the following four advantages: First, compared to Boltzmann machines, the computational process of GANs only uses backpropagation algorithms to compute gradients, and the learning process does not require approximate inference, making it fast and accurate; Second, compared to Variational Autoencoders, GANs do not have variational lower bounds, do not require any biases, and can generate clear images; Third, compared to PixelRNN which generates one pixel at a time, GANs can generate data in parallel, reducing the time required to generate samples; Fourth, compared to Nonlinear Independent Component Analysis, GANs do not impose any constraints on the size of the generator’s input values and can train any type of generator network, providing a flexible framework. At the same time, GANs also have the following four disadvantages: First, convergence is difficult; it is challenging to achieve Nash equilibrium, and gradient descent can only guarantee Nash equilibrium in the case of convex functions. In practical training, it is difficult to achieve good synchronization between the generator and discriminator, leading to an unstable training process and difficulty in convergence; Second, mode collapse; when the generator can produce real samples under a certain parameter setting, its learning ability degrades, always generating the same sample point, causing the generator to learn only part of the modes in the real data, resulting in a lack of diversity in generated samples; Third, gradient vanishing; when the discriminator is trained to always correctly distinguish real samples from generated samples, no matter how realistic the generated samples are, the discriminator can correctly distinguish them, leading to a zero gradient where the generator cannot continue to learn, resulting in gradient vanishing; Fourth, model control is difficult; the standard GAN model only uses random vectors as inputs for the generator, making it impossible to constrain the model to generate samples with specified features.

The training process of GANs is a binary max-min game between the generator G and the discriminator D, and the objective function is defined as formula (1). During training, one model is fixed while the other model’s parameters are updated, and the two models are trained alternately. First, the discriminator learns the distribution of real samples x. Once D has a certain understanding of the distribution of x, it uses D to distinguish the authenticity of the generated samples G(z). G continuously improves its generation ability through D’s discrimination, while D continuously improves its discrimination ability by learning the distribution of x. Through continuous adversarial training, D maximizes the probability of correctly judging the source of training samples while maximizing the similarity between G(z) and x. Ultimately, G and D reach Nash equilibrium, where D cannot determine whether its input comes from real samples or generated samples, at which point G can be considered to have learned the distribution of real samples.

Leave a Comment Cancel reply