With the rapid development of deep learning, significant progress has also been made in the field of generative models. Generative Adversarial Networks (GAN) are an unsupervised learning method proposed based on the theory of two-player zero-sum games in game theory. GAN consists of a generator network and a discriminator network, and is trained through adversarial learning. In recent years, GAN has become a hot research direction. GAN has not only achieved good results in the field of images but has also emerged in natural language processing (NLP) and other fields. This paper elaborates on the basic principles of GAN, the training process, and the problems of traditional GAN, and further details the principles and structures of variant models of GAN proposed through modifications of loss functions, changes in network structures, and combinations of both. These include: Conditional GAN (CGAN), Wasserstein GAN (WGAN) based on Wasserstein distance and its gradient penalty version (WGAN-GP), Informational GAN (InfoGAN) based on mutual information theory, Sequence GAN (SeqGAN), Pix2Pix, Cycle-consistent GAN (Cycle GAN) and its enhanced version (Augmented CycleGAN). It summarizes the basic principles and structures of GAN and its variants in computer vision, speech, and NLP fields, including: Face aging CGAN (Age-cGAN), Two-pathway GAN (TP-GAN), Disentangled representation learning GAN (DR-GAN), DualGAN, GeneGAN, Speech enhancement GAN (SEGAN), etc. It introduces the application of GAN in medicine, data augmentation, and other fields, including: Data augmentation GAN (DAGAN), Medical GAN (MedGAN), and Unsupervised pixel-level domain adaptation method (PixelDA). Finally, it looks forward to the future development trends and directions of GAN.
http://www.aas.net.cn/cn/article/doi/10.16383/j.aas.c180831
Since 2012, the rapid development of deep learning has led to rapid advancements in artificial intelligence research. Today, the development of artificial intelligence is at a rapidly rising stage, with many researchers investing energy and capital into the field. The development of artificial intelligence is evident, from drones entering people’s lives to Google’s AlphaGo defeating top human players in Go, all of which demonstrate the rapid development of deep learning in recent years. From the development process of AlphaGo, it can be seen that since 2016, its target opponent is no longer top human players, but rather its previous versions, opening up a new field for itself. AlphaGo uses Monte Carlo tree search, leveraging two deep neural networks, the value network and the policy network, to evaluate move selection and landing points[1].
Moreover, the development of deep learning is constrained by neural networks, which can be said to be the soul of deep learning. Their wide range of application scenarios has significantly improved both the depth and breadth of deep learning research. The generative adversarial networks discussed in this review, whether in the generator or the discriminator, utilize neural networks, and many of the mentioned application fields will heavily employ neural networks. In recent years, research on neural networks has achieved remarkable results in fields such as image processing, speech recognition, and natural language processing. However, neural networks also have characteristics such as a large number of parameters and training difficulties, and corresponding improvements are constantly emerging. With the rapid increase in computing power, neural networks can be trained faster with more parameters.
In generative models, Generative Adversarial Networks (GAN)[2] are a special case. Their introduction has not only elevated the development of various fields to new heights but has also pushed the field of artificial intelligence into an era of “intelligence”. It can be said that GAN is “dreaming”, because in nature, only mammals dream, which reflects the significance of GAN in the field of artificial intelligence (AI). GAN can be seen as a network structure with adversarial ideas. Although many variants of GAN have emerged with wide applications, its core concept has remained unchanged, that is, the adversarial idea has not changed. The introduction of adversarial ideas can refer to the adversarial concept proposed by Wang Kunfeng et al.[3], which is that the concept of adversarialness is inherent in games and competition. The adversarial idea of GAN is to introduce a discriminator that can distinguish between real and generated data in the process of generating data, making the generator and discriminator adversarial to each other. The role of the discriminator is to strive to distinguish real data from generated data, while the role of the generator is to improve itself to generate data that can confuse the discriminator. When the discriminator can no longer differentiate between real and fake data, it is considered that the generator has achieved a good generation effect. The introduction of this adversarial idea in GAN is of significant importance for the development of generative models.
GAN is an unsupervised generative model. The model is mainly divided into two categories, one is the generative model, and the other is the discriminative model. The generative model will model the joint distribution of
2) If category updates are needed, the generative model only needs to calculate the new joint probability distribution of
3) In terms of error rate analysis, the final error rate obtained by the generative model will be higher than that of the discriminative model, but the sampling complexity of the generative model is lower, requiring very few samples for the error rate to converge.
4) For unlabeled data, the generative model (e.g., Deep Belief Network, DBN) can better utilize the information contained in the data itself.
5) Discriminative models typically need to solve convex optimization problems.
The above is a simple analysis of generative models, followed by a discussion on generative models. Generative models are mainly divided into Variational Autoencoders (VAE) and GAN.
First, VAE[4] is a generative model based on the variational principle of deep learning. Assume
Compared to VAE, GAN does not use the variational lower bound. If the discriminator is well trained, the generator can perfectly learn the probability distribution of the training samples. In other words, GAN is asymptotically consistent, while VAE is a biased estimator. As the name suggests, GAN consists of two sub-models, the generator and the discriminator. Here, the two networks can be likened to a criminal (the generator) who forges currency and a police officer (the discriminator). The criminal’s task is to generate sufficiently realistic counterfeit money to deceive the police into thinking the counterfeit is real; while the police’s task is to distinguish between the authenticity of the currency. Ultimately, the police will be unable to tell the difference between real and fake currency. The ultimate optimization goal of the generator and discriminator is to reach Nash equilibrium[5].
Since both generative models have their own advantages, if VAE is combined with GAN, then GAN can generate high-quality images with clear and distinctive features. VAE, on the other hand, reconstructs the original image, encoding the generated latent vector under the action of the encoder, which can retain the features of the original image while following a Gaussian distribution. The proposal of VAE-GAN[6] realizes this idea, allowing the discriminator of GAN to learn feature representations, while VAE assists in the reconstruction objective. Its structure is shown in Figure 1. The benefit of combining VAE + GAN is that it can generate high-quality images while maintaining model stability.
So far, the main application scenarios of GAN are concentrated in three major fields. In the field of image processing, for example: significant achievements have been made in face recognition and synthesis, image super-resolution, and image transformation; in the field of speech processing, GAN has also made certain developments, such as speech enhancement and speech recognition; additionally, GAN has made some progress in the field of natural language processing, such as machine translation, bilingual dictionaries, and discourse analysis.
Besides these three major fields, this paper also summarizes some novel applications in other areas, such as human pose estimation, preventing malware attacks, physical applications, medical data processing, and autonomous driving.
Since Goodfellow proposed GAN in 2014, especially in recent years, there has been an explosive growth in articles and applications related to GAN. On one hand, various application scenarios pose challenging problems for the development of GAN, prompting researchers to study new GAN structures, models, and training algorithms based on application scenarios to solve problems in computer vision, natural language processing, and speech processing; on the other hand, the introduction of new GAN theories and models has also expanded the breadth and depth of artificial intelligence applications in various fields, prompting us to summarize and analyze recent research progress and important literature on GAN applications.
This paper first introduces nine widely used GANs and their variants, then it provides a detailed overview of GAN applications in computer vision, natural language processing, and speech processing. Finally, it exploratively presents future development trends and research directions for GAN.
Quick Access to Specialized Knowledge
Convenient Download, please follow theSpecial Knowledge WeChat account (click the blue button above to follow)
Reply “GAN37” in the background to get the《Research Progress on Applications of Generative Adversarial Networks (GAN) (Chinese Version), 37-page PDF》download link index from Special Knowledge

