New Intelligence Report
New Intelligence Report
Source: towardsdatascience
Author: Connor Shorten Editor: Xiao Qin
[New Intelligence Guide]Generative Adversarial Networks (GANs) are one of the most fascinating and popular applications in deep learning. This article lists 10 papers on GANs that will provide you with a great introduction to GANs and help you understand the foundations of state-of-the-art techniques.
The 10 selected GAN papers include:
-
DCGANs
-
Improved Techniques for Training GANs
-
Conditional GANs
-
Progressively Growing GANs
-
BigGAN
-
StyleGAN
-
CycleGAN
-
Pix2Pix
-
StackGAN
-
Generative Adversarial Networks
I recommend starting your GAN journey with the DCGAN paper. This paper demonstrates how convolutional layers can be used with GANs and provides a series of architectural guidelines for this. It also discusses issues such as visualizing GAN features, latent space interpolation, using discriminator features to train classifiers, and evaluating results. All these issues are bound to arise in your GAN research.
In summary, the DCGAN paper is a must-read GAN paper because it defines the architecture in a very clear way, making it easy to start with some code and begin to develop an intuition for building GANs.
DCGAN Model: Generator Architecture with Upsampling Convolutional Layers
Paper:
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Alec Radford, Luke Metz, Soumith Chintala
https://arxiv.org/abs/1511.06434
This paper (co-authored by Ian Goodfellow) provides a series of recommendations based on the architectural guidelines outlined in the aforementioned DCGAN paper. This paper will help you understand the best hypotheses for GAN instability. Additionally, it provides many other techniques for stabilizing DCGAN training, including feature matching, minibatch discrimination, historical averaging, one-sided label smoothing, and virtual batch normalization. Using these techniques to build a simple DCGAN implementation is a great exercise that deepens your understanding of GANs.
Paper:
Improved Techniques for Training GANs
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen
https://arxiv.org/abs/1606.03498
This is a very good paper that is easy to read. Conditional GANs are among the state-of-the-art GANs. The paper shows how to integrate class labels of the data, making GAN training more stable. The concept of adjusting GANs with prior information has been a recurring theme in subsequent GAN research, especially important for papers focused on image-to-image or text-to-image tasks.
Conditional GAN Architecture: Class label y is concatenated with the random noise vector z as input to the network
Paper:
Conditional Generative Adversarial Nets
Mehdi Mirza, Simon Osindero
https://arxiv.org/abs/1411.1784
Progressively Growing GAN (PG-GAN) has impressive results and a creative approach to GAN problems, making it a must-read paper.
This GAN paper from NVIDIA Research proposes training GANs in a progressively growing manner, achieving stunning results in generating images by using a progressively increasing GAN network (called PG-GAN) and a carefully curated CelebA-HQ dataset. The authors state that this approach not only stabilizes training, but also produces the highest quality images to date.
The key idea is to progressively increase the generator and discriminator: starting from low resolution, new layers are added as training progresses to model increasingly fine details. “Progressive Growing” refers to first training a 4×4 network, then an 8×8, continuously increasing until reaching 1024×1024. This both speeds up training and greatly stabilizes the training process while producing images of very high quality.
The multi-scale architecture of Progressively Growing GAN, where the model increases from 4×4 to 1024×1024
Paper:
Progressive Growing of GANs for Improved Quality, Stability, and Variation
Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen
https://arxiv.org/abs/1710.10196
Related Reading:
The most realistic GAN to date: NVIDIA’s progressive growing method trains GANs to generate unprecedented high-definition images
The BigGAN model is one of the highest quality models for generating images based on ImageNet. This model is difficult to implement on local machines, and BigGAN has many components, such as Self-Attention, Spectral Normalization, and cGAN with a projection discriminator, which are better explained in their respective papers. However, this paper provides a good overview of the foundational ideas that constitute the current state-of-the-art techniques, making it very worthwhile to read.
Images generated by BigGAN
Paper:
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Andrew Brock, Jeff Donahue, Karen Simonyan
https://arxiv.org/abs/1809.11096
The StyleGAN model can be considered state-of-the-art, particularly utilizing latent space control. This model draws on a mechanism called Adaptive Instance Normalization (AdaIN) from neural style transfer to control the latent space vector z. The combination of mapping networks and the distribution of AdaIN conditions throughout the generator model makes it challenging to implement a StyleGAN on your own, but it remains a great paper containing many interesting ideas.
StyleGAN architecture allowing latent space control
Paper:
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras, Samuli Laine, Timo Aila
https://arxiv.org/abs/1812.04948
The CycleGAN paper differs from the previous six papers listed because it discusses the problem of image-to-image translation rather than image synthesis from random vectors. CycleGAN specifically addresses the case of image-to-image translation without paired training samples. However, due to the elegance of the Cycle-Consistency loss formula and the insights into stabilizing GAN training, this is a great paper. CycleGAN has many cool applications, such as super-resolution and style transfer, for example, converting images of horses to zebras.
The main idea behind Cycle Consistency Loss, where a sentence is translated from French to English and then back to French, should be the same as the original sentence
Paper:
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
https://arxiv.org/abs/1703.10593
Pix2Pix is another image-to-image translation GAN model. This framework uses paired training samples and employs various configurations within the GAN model. While reading this paper, I found the discussion about PatchGAN to be the most interesting part. PatchGAN determines whether images are real or fake by observing 70×70 regions of the image rather than looking at the entire image. The model also showcases an interesting U-Net style generator architecture and uses ResNet style skip connections within the generator model. Pix2Pix has many cool applications, such as converting sketches into realistic photos.
Image-to-Image translation using paired training samples
Paper:
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
https://arxiv.org/abs/1611.07004
The StackGAN paper is quite different from the earlier papers in this list. It is most similar to Conditional GAN and Progressively Growing GANs. The StackGAN model operates similarly to Progressively Growing GANs, as it can work at multiple scales. StackGAN first outputs images at a resolution of 64×64, and then uses this as prior information to generate an image at a resolution of 256×256.
StackGAN generates images from natural language text. This is achieved by altering text embeddings to capture visual features. This is a very interesting paper, and if the latent space control shown in StyleGAN is combined with the natural language interface defined in StackGAN, it would surely be very surprising.
The idea behind the multi-scale architecture of StackGAN based on text embeddings
Paper:
StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
https://arxiv.org/abs/1612.03242
Ian Goodfellow’s original GAN paper is a must-read for anyone researching GANs. This paper defines the GAN framework and discusses the “non-saturating” loss function. The paper also provides a derivation of the optimal discriminator, which has frequently appeared as a proof in GAN papers in recent years. The paper also experimentally validates the effectiveness of GANs on the MNIST, TFD, and CIFAR-10 image datasets.
Paper:
Generative Adversarial Networks
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio
https://arxiv.org/abs/1406.2661
Original link:
https://towardsdatascience.com/must-read-papers-on-gans-b665bbae3317