Top 10 Must-Read Papers on Generative Adversarial Networks (GANs)

New Intelligence Report

Source: towardsdatascience

Author: Connor Shorten Editor: Xiao Qin

[New Intelligence Guide]Generative Adversarial Networks (GANs) are one of the most fascinating and popular applications in deep learning. This article lists 10 papers on GANs that will provide you with a great introduction to GANs and help you understand the foundations of state-of-the-art techniques.

The 10 selected GAN papers include:

DCGANs
Improved Techniques for Training GANs
Conditional GANs
Progressively Growing GANs
BigGAN
StyleGAN
CycleGAN
Pix2Pix
StackGAN
Generative Adversarial Networks

DCGANs — Radford et al. (2015)

I recommend starting your GAN journey with the DCGAN paper. This paper demonstrates how convolutional layers can be used with GANs and provides a series of architectural guidelines for this. It also discusses issues such as visualizing GAN features, latent space interpolation, using discriminator features to train classifiers, and evaluating results. All these issues are bound to arise in your GAN research.

In summary, the DCGAN paper is a must-read GAN paper because it defines the architecture in a very clear way, making it easy to start with some code and begin to develop an intuition for building GANs.

DCGAN Model: Generator Architecture with Upsampling Convolutional Layers

Paper:

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford, Luke Metz, Soumith Chintala

https://arxiv.org/abs/1511.06434

Improved Techniques for Training GANs — Salimans et al. (2016)

This paper (co-authored by Ian Goodfellow) provides a series of recommendations based on the architectural guidelines outlined in the aforementioned DCGAN paper. This paper will help you understand the best hypotheses for GAN instability. Additionally, it provides many other techniques for stabilizing DCGAN training, including feature matching, minibatch discrimination, historical averaging, one-sided label smoothing, and virtual batch normalization. Using these techniques to build a simple DCGAN implementation is a great exercise that deepens your understanding of GANs.

Paper:

Improved Techniques for Training GANs

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen

https://arxiv.org/abs/1606.03498

Conditional GANs — Mirza and Osindero (2014)

This is a very good paper that is easy to read. Conditional GANs are among the state-of-the-art GANs. The paper shows how to integrate class labels of the data, making GAN training more stable. The concept of adjusting GANs with prior information has been a recurring theme in subsequent GAN research, especially important for papers focused on image-to-image or text-to-image tasks.

Conditional GAN Architecture: Class label y is concatenated with the random noise vector z as input to the network

Paper:

Conditional Generative Adversarial Nets

Mehdi Mirza, Simon Osindero

https://arxiv.org/abs/1411.1784

Progressively Growing GANs— Karras et al. (2017)

Progressively Growing GAN (PG-GAN) has impressive results and a creative approach to GAN problems, making it a must-read paper.

This GAN paper from NVIDIA Research proposes training GANs in a progressively growing manner, achieving stunning results in generating images by using a progressively increasing GAN network (called PG-GAN) and a carefully curated CelebA-HQ dataset. The authors state that this approach not only stabilizes training, but also produces the highest quality images to date.

The key idea is to progressively increase the generator and discriminator: starting from low resolution, new layers are added as training progresses to model increasingly fine details. “Progressive Growing” refers to first training a 4×4 network, then an 8×8, continuously increasing until reaching 1024×1024. This both speeds up training and greatly stabilizes the training process while producing images of very high quality.

The multi-scale architecture of Progressively Growing GAN, where the model increases from 4×4 to 1024×1024

Paper:

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen

https://arxiv.org/abs/1710.10196

Related Reading:

The most realistic GAN to date: NVIDIA’s progressive growing method trains GANs to generate unprecedented high-definition images

BigGAN — Brock et al. (2019)

The BigGAN model is one of the highest quality models for generating images based on ImageNet. This model is difficult to implement on local machines, and BigGAN has many components, such as Self-Attention, Spectral Normalization, and cGAN with a projection discriminator, which are better explained in their respective papers. However, this paper provides a good overview of the foundational ideas that constitute the current state-of-the-art techniques, making it very worthwhile to read.

Images generated by BigGAN

Paper:

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Andrew Brock, Jeff Donahue, Karen Simonyan

https://arxiv.org/abs/1809.11096

StyleGAN — Karras et al. (2019)

The StyleGAN model can be considered state-of-the-art, particularly utilizing latent space control. This model draws on a mechanism called Adaptive Instance Normalization (AdaIN) from neural style transfer to control the latent space vector z. The combination of mapping networks and the distribution of AdaIN conditions throughout the generator model makes it challenging to implement a StyleGAN on your own, but it remains a great paper containing many interesting ideas.

StyleGAN architecture allowing latent space control

Paper:

A Style-Based Generator Architecture for Generative Adversarial Networks

Tero Karras, Samuli Laine, Timo Aila

https://arxiv.org/abs/1812.04948

CycleGAN — Zhu et al. (2017)

The CycleGAN paper differs from the previous six papers listed because it discusses the problem of image-to-image translation rather than image synthesis from random vectors. CycleGAN specifically addresses the case of image-to-image translation without paired training samples. However, due to the elegance of the Cycle-Consistency loss formula and the insights into stabilizing GAN training, this is a great paper. CycleGAN has many cool applications, such as super-resolution and style transfer, for example, converting images of horses to zebras.

The main idea behind Cycle Consistency Loss, where a sentence is translated from French to English and then back to French, should be the same as the original sentence

Paper:

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros

https://arxiv.org/abs/1703.10593

Pix2Pix — Isola et al. (2016)

Pix2Pix is another image-to-image translation GAN model. This framework uses paired training samples and employs various configurations within the GAN model. While reading this paper, I found the discussion about PatchGAN to be the most interesting part. PatchGAN determines whether images are real or fake by observing 70×70 regions of the image rather than looking at the entire image. The model also showcases an interesting U-Net style generator architecture and uses ResNet style skip connections within the generator model. Pix2Pix has many cool applications, such as converting sketches into realistic photos.

Image-to-Image translation using paired training samples

Paper:

Image-to-Image Translation with Conditional Adversarial Networks

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros

https://arxiv.org/abs/1611.07004

StackGAN — Zhang et al. (2017)

The StackGAN paper is quite different from the earlier papers in this list. It is most similar to Conditional GAN and Progressively Growing GANs. The StackGAN model operates similarly to Progressively Growing GANs, as it can work at multiple scales. StackGAN first outputs images at a resolution of 64×64, and then uses this as prior information to generate an image at a resolution of 256×256.

StackGAN generates images from natural language text. This is achieved by altering text embeddings to capture visual features. This is a very interesting paper, and if the latent space control shown in StyleGAN is combined with the natural language interface defined in StackGAN, it would surely be very surprising.

The idea behind the multi-scale architecture of StackGAN based on text embeddings

Paper:

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas

https://arxiv.org/abs/1612.03242

Generative Adversarial Networks — Goodfellow et al. (2014)

Ian Goodfellow’s original GAN paper is a must-read for anyone researching GANs. This paper defines the GAN framework and discusses the “non-saturating” loss function. The paper also provides a derivation of the optimal discriminator, which has frequently appeared as a proof in GAN papers in recent years. The paper also experimentally validates the effectiveness of GANs on the MNIST, TFD, and CIFAR-10 image datasets.

Paper:

Generative Adversarial Networks

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio

https://arxiv.org/abs/1406.2661

Original link:

https://towardsdatascience.com/must-read-papers-on-gans-b665bbae3317

New Intelligence Report

Leave a Comment Cancel reply