Source: Xinzhi Yuan
This article is approximately 2200 words, and it is recommended to read it in 7 minutes.
The selected papers in this article provide an easy-to-read introduction to GANs, helping you understand the fundamentals of GAN technology.
[ Introduction ]Generative Adversarial Networks (GANs) are one of the most interesting and popular applications in deep learning. This article lists 10 papers on GANs that will provide you with a great introduction to GANs and help you understand the fundamentals of state-of-the-art technology.
The 10 GAN papers selected in this article include:
-
DCGANs
-
Improved Techniques for Training GANs
-
Conditional GANs
-
Progressively Growing GANs
-
BigGAN
-
StyleGAN
-
CycleGAN
-
Pix2Pix
-
StackGAN
-
Generative Adversarial Networks
DCGANs – Radford et al. (2015)
I recommend starting your GAN journey with the DCGAN paper. This paper demonstrates how convolutional layers can be used with GANs and provides a series of architectural guidelines. It also discusses issues such as visualizing GAN features, latent space interpolation, using discriminator features to train classifiers, and evaluating results. All these issues will inevitably arise in your GAN research.
In summary, the DCGAN paper is a must-read GAN paper because it defines the architecture in a very clear way, making it easy to start with some code and begin to form an intuition for developing GANs.
DCGAN Model: Generator Architecture with Upsampling Convolutional Layers
Paper:
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Alec Radford, Luke Metz, Soumith Chintala
https://arxiv.org/abs/1511.0643
Improved Techniques for Training GANs – Salimans et al. (2016)
This paper (co-authored by Ian Goodfellow) provides a series of recommendations based on the architectural guidelines listed in the aforementioned DCGAN paper. This paper will help you understand the best hypotheses for GAN instability. Additionally, it offers many other techniques for stabilizing DCGAN training, including feature matching, minibatch discrimination, historical averaging, one-sided label smoothing, and virtual batch normalization. Using these techniques to build a simple DCGAN implementation is a great exercise that helps deepen your understanding of GANs.
Paper:
Improved Techniques for Training GANs
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen
https://arxiv.org/abs/1606.0349
Conditional GANs – Mirza and Osindero (2014)
This is a great paper that reads smoothly. Conditional GANs are one of the state-of-the-art GANs. The paper shows how to integrate class labels of the data to make GAN training more stable. The idea of adjusting GANs with prior information is a recurring theme in subsequent GAN research, especially important for papers focusing on image-to-image or text-to-image tasks.
Conditional GAN Architecture: Class labels y are concatenated as inputs to the network along with the random noise vector z.
Paper:
Conditional Generative Adversarial Nets
Mehdi Mirza, Simon Osindero
https://arxiv.org/abs/1411.178
Progressively Growing GANs – Karras et al. (2017)
Progressively Growing GAN (PG-GAN) has outstanding results and a creative approach to GAN problems, making it a must-read paper.
This GAN paper comes from NVIDIA Research, proposing to train GANs in a progressively growing manner, achieving stunning generated images using a progressively growing GAN network (called PG-GAN) and a carefully curated CelebA-HQ dataset. The authors state that this approach not only stabilizes training but also produces the highest quality images to date.
The key idea is to progressively grow the generator and discriminator: starting from low resolution, new layers are added to model increasingly fine details as training progresses. “Progressive Growing” refers to first training a 4×4 network, then an 8×8, and so on, up to 1024×1024. This speeds up training and significantly stabilizes the training process, resulting in very high-quality generated images.
Multi-scale architecture of Progressively Growing GAN, where the model grows from 4×4 to 1024×1024.
Paper:
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen
https://arxiv.org/abs/1710.1019
BigGAN – Brock et al. (2019)
BigGAN model is one of the models that generates the highest quality images based on ImageNet. This model is challenging to implement on local machines, and BigGAN has many components, such as Self-Attention, Spectral Normalization, and cGAN with projection discriminator, which are better explained in their respective papers. However, this paper provides a good overview of the ideas that constitute the foundation of the current state-of-the-art technology, making it well worth reading.
Images generated by BigGAN
Paper:
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Andrew Brock, Jeff Donahue, Karen Simonyan
https://arxiv.org/abs/1809.1109
StyleGAN – Karras et al. (2019)
StyleGAN model can be considered state-of-the-art, especially utilizing latent space control. This model leverages a mechanism called Adaptive Instance Normalization (AdaIN) from neural style transfer to control the latent space vector z. The combination of the mapping network and the distribution of AdaIN conditions throughout the generator model makes it difficult to implement a StyleGAN on your own, but it remains a great paper containing many interesting ideas.
StyleGAN architecture, allowing latent space control.
Paper:
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras, Samuli Laine, Timo Aila
https://arxiv.org/abs/1812.04948
CycleGAN – Zhu et al. (2017)
The CycleGAN paper is different from the previous six papers listed because it discusses the image-to-image translation problem rather than the image synthesis problem from random vectors. CycleGAN specifically addresses the case of image-to-image translation without paired training samples. However, due to the elegance of the Cycle-Consistency loss formula and the insights into stabilizing GAN training, this is a great paper. CycleGAN has many cool applications, such as super-resolution and style transfer, e.g., turning images of horses into zebras.
The main idea behind Cycle Consistency Loss: a sentence translated from French to English and back to French should be the same as the original sentence.
Paper:
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
https://arxiv.org/abs/1703.1059
Pix2Pix – Isola et al. (2016)
Pix2Pix is another GAN model for image-to-image translation. This framework uses paired training samples and employs various configurations in the GAN model. I found the most interesting part of this paper to be the discussion about PatchGAN. PatchGAN judges whether images are real or fake by observing 70×70 regions instead of looking at the entire image. The model also showcases an interesting U-Net style generator architecture and the use of ResNet style skip connections within the generator model. Pix2Pix has many cool applications, such as converting sketches into realistic photos.
Image-to-Image translation using paired training samples.
Paper:
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
https://arxiv.org/abs/1611.0700
StackGAN – Zhang et al. (2017)
The StackGAN paper is quite different from the previous few papers in this list. It is most similar to Conditional GANs and Progressively Growing GANs. The StackGAN model operates similarly to Progressively Growing GANs, as it can work at multiple scales. StackGAN first outputs images with a resolution of 64×64 and then uses that as prior information to generate an image with a resolution of 256×256.
StackGAN generates images from natural language text. This is achieved by modifying text embeddings to capture visual features. This is a very interesting paper, and if the latent space control shown in StyleGAN is combined with the natural language interface defined in StackGAN, it would surely be very surprising.
Text-embedded StackGAN multi-scale architecture
The underlying idea
Paper:
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
https://arxiv.org/abs/1612.0324
Generative Adversarial Networks – Goodfellow et al. (2014)
Ian Goodfellow’s original GAN paper is a must-read for anyone studying GANs. This paper defines the GAN framework and discusses the “non-saturating” loss function. It also provides derivations for the optimal discriminator, which has frequently appeared in GAN papers in recent years. The paper also experimentally verifies the effectiveness of GANs on the MNIST, TFD, and CIFAR-10 image datasets.
Paper:
Generative Adversarial Networks
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
https://arxiv.org/abs/1406.266
Original link:
https://towardsdatascience.com/must-read-papers-on-gans-b665bbae3317
Editor: Huang Jiyan
Proofreader: Lin Yilin