Generative Adversarial Networks (GANs) are one of the most interesting and popular applications in deep learning. This article lists 10 papers on GANs that will provide you with a great introduction to GANs and help you understand the foundations of state-of-the-art techniques.
The 10 selected GAN papers include:
-
DCGANs
-
Improved Techniques for Training GANs
-
Conditional GANs
-
Progressively Growing GANs
-
BigGAN
-
StyleGAN
-
CycleGAN
-
Pix2Pix
-
StackGAN
-
Generative Adversarial Networks
DCGANs — Radford et al. (2015)
I recommend starting your GAN journey with the DCGAN paper. This paper demonstrates how convolutional layers can be used with GANs and provides a series of architectural guidelines. It also discusses issues such as visualizing GAN features, latent space interpolation, using discriminator features to train classifiers, and evaluating results. All these issues will inevitably arise in your GAN research.
In summary, the DCGAN paper is a must-read GAN paper because it defines the architecture in a very clear way, making it easy to start with some code and begin to develop intuition for GANs.
DCGAN model: Generator architecture with upsampling convolutional layers
Paper:
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Alec Radford, Luke Metz, Soumith Chintala
https://arxiv.org/abs/1511.06434
Improved Techniques for Training GANs — Salimans et al. (2016)
This paper (co-authored by Ian Goodfellow) provides a series of recommendations based on the architectural guidelines listed in the aforementioned DCGAN paper. This paper will help you understand the best assumptions regarding GAN instability. Additionally, it offers many other techniques for stabilizing DCGAN training, including feature matching, minibatch discrimination, historical averaging, one-sided label smoothing, and virtual batch normalization. Using these techniques to build a simple DCGAN implementation is a great exercise that helps deepen your understanding of GANs.
Paper:
Improved Techniques for Training GANs
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen
https://arxiv.org/abs/1606.03498
Conditional GANs — Mirza and Osindero (2014)
This is a great paper that reads smoothly. Conditional GANs are one of the state-of-the-art GANs. The paper shows how to integrate class labels of the data to stabilize GAN training. The concept of adjusting GANs with prior information has become a recurring theme in subsequent GAN research, especially important for papers focusing on image-to-image or text-to-image tasks.
Conditional GAN architecture: Class labels y are concatenated as inputs to the network along with the random noise vector z
Paper:
Conditional Generative Adversarial Nets
Mehdi Mirza, Simon Osindero
https://arxiv.org/abs/1411.1784
Progressively Growing GANs— Karras et al. (2017)
Progressively Growing GAN (PG-GAN) has stunning results and a creative approach to GAN problems, making it a must-read paper.
This GAN paper from NVIDIA Research proposes training GANs in a progressively growing manner, achieving stunning image generation results using a progressively growing GAN architecture (called PG-GAN) and a carefully curated CelebA-HQ dataset. The authors state that this approach not only stabilizes training but also produces the highest quality images to date.
The key idea is to progressively increase the generator and discriminator: starting from low resolution, new layers are added to model increasingly fine details as training progresses. “Progressive Growing” refers to first training a 4×4 network, then an 8×8, continuously increasing until reaching 1024×1024. This accelerates training and greatly stabilizes it, resulting in very high-quality generated images.
Multi-scale architecture of Progressive Growing GAN, model grows from 4×4 to 1024×1024
Paper:
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen
https://arxiv.org/abs/1710.10196
BigGAN — Brock et al. (2019)
The BigGAN model is one of the highest quality models for generating images based on ImageNet. This model is difficult to implement on local machines, and BigGAN has many components, such as Self-Attention, Spectral Normalization, and cGAN with a projection discriminator, which are better explained in their respective papers. However, this paper provides a good overview of the ideas that constitute the foundational papers of the current state-of-the-art techniques, making it very worthwhile to read.
Image generated by BigGAN
Paper:
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Andrew Brock, Jeff Donahue, Karen Simonyan
https://arxiv.org/abs/1809.11096
StyleGAN — Karras et al. (2019)
The StyleGAN model can be considered state-of-the-art, particularly utilizing latent space control. This model borrows a mechanism from neural style transfer called Adaptive Instance Normalization (AdaIN) to control the latent space vector z. The combination of the mapping network and the AdaIN conditioning across the distribution of the generator model makes it challenging to implement a StyleGAN on your own, but it remains a great paper containing many interesting ideas.
StyleGAN architecture, allowing latent space control
Paper:
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras, Samuli Laine, Timo Aila
https://arxiv.org/abs/1812.04948
CycleGAN — Zhu et al. (2017)
The CycleGAN paper differs from the previous six papers listed as it discusses the image-to-image translation problem rather than image synthesis from random vectors. CycleGAN specifically addresses the case of image-to-image translation without paired training samples. However, due to the elegance of the Cycle-Consistency loss formula and the insights it provides on stabilizing GAN training, it is a great paper. CycleGAN has many cool applications, such as super-resolution and style transfer, for example turning images of horses into zebras.
The main idea behind Cycle Consistency Loss, a sentence translated from French to English and then back to French should be the same as the original.
Paper:
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
https://arxiv.org/abs/1703.10593
Pix2Pix — Isola et al. (2016)
Pix2Pix is another GAN model for image-to-image translation. This framework uses paired training samples and various configurations in the GAN model. The most interesting part of reading this paper was the discussion on PatchGAN. PatchGAN determines whether images are real or fake by observing 70×70 regions rather than looking at the entire image. The model also showcases an interesting U-Net style generator architecture and uses ResNet-style skip connections within the generator model. Pix2Pix has many cool applications, such as converting sketches into realistic photos.
Image-to-Image translation using paired training samples
Paper:
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
https://arxiv.org/abs/1611.07004
StackGAN — Zhang et al. (2017)
The StackGAN paper is quite different from the earlier papers in this list. It is most similar to Conditional GAN and Progressively Growing GANs. The StackGAN model operates similarly to Progressively Growing GANs as it can work on multiple scales. StackGAN first outputs images at a resolution of 64×64, then uses that as prior information to generate an image at a resolution of 256×256.
StackGAN generates images from natural language text. This is achieved by altering the text embeddings to capture visual features. This is a very interesting article, and it would be fascinating to see the latent space control shown in StyleGAN combined with the natural language interface defined in StackGAN.
Idea behind the multi-scale architecture of StackGAN based on text embeddings
Paper:
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
https://arxiv.org/abs/1612.03242
Generative Adversarial Networks — Goodfellow et al. (2014)
Ian Goodfellow’s original GAN paper is a must-read for anyone researching GANs. This paper defines the GAN framework and discusses the “non-saturating” loss function. It also provides the derivation of the optimal discriminator, which has frequently appeared in recent GAN papers. The paper also experimentally validates the effectiveness of GANs on the MNIST, TFD, and CIFAR-10 image datasets.
Paper:
Generative Adversarial Networks
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
https://arxiv.org/abs/1406.2661
Source: Xinzhi Yuan
Previous article recommendations
🔗【Important Notice】 About the nomination work for the 2019 annual CAA Fellow candidates
🔗【Important Notice】 About the recommendation work for the 2019 annual CAA Science and Technology Awards
🔗【Important Notice】 About the sixth Yang Jiachis Science and Technology Award evaluation activities
🔗【Call for Papers】 About the first China Robotics Conference and the 2019 National Robotics Development Forum
🔗【Important Notice】 About the recommendation work for the fifth China Automation Society Young Scientist Award
🔗【Important Notice】 2019 National Robotics Development Forum and Robocup Robot World Cup China Competition Exhibition and Sponsorship Notice
🔗【Important Notice】 About the recommendation work for the 2019 CAA Higher Education Teaching Achievement Award
🔗【Important Notice】 About the preliminary notice for soliciting 2019 China Automation Conference thematic workshops from various branches and journal editorial departments
🔗【Important Notice】 About the 2019 CAA Excellent Doctoral Dissertation Award and recommendation work
🔗【Important Notice】 About the 2019 National Robotics Development Forum and Robocup Robot World Cup China Competition, and the 2019 China Automation Industry Annual Conference
🔗【CAC 2019】 2019 China Automation Conference Call for Papers
🔗【CAA】 The Chinese Automation Society elected the leadership of the 11th Council (including the list)