The Development History of Generative Adversarial Networks (GAN)

Five years ago, Generative Adversarial Networks (GANs) revolutionized the field of deep learning. This revolution led to significant technological breakthroughs. Ian Goodfellow and others proposed GANs in “Generative Adversarial Networks.” The academic and industrial sectors began to embrace and welcome the arrival of GANs. The rise of GANs was inevitable.

Firstly, the most impressive feature of GANs is their unsupervised learning nature. GANs do not require labeled data, making them powerful since labeling data is a tedious task.

Secondly, the potential use cases of GANs make them the center of conversation. They can generate high-quality images, enhance images, generate images from text, transform images from one domain to another, change facial appearances with age, and much more. This list is far from exhaustive. We will introduce some popular GAN architectures in this article.

Thirdly, the continuous research around GANs is so fascinating that it has attracted attention from various other industries. We will discuss significant technological breakthroughs in the latter part of this article.

Birth

Generative Adversarial Networks (GANs) consist of two networks: a generator network and a discriminator network. These two networks can be neural networks, ranging from convolutional neural networks to recurrent neural networks and autoencoders. In this configuration, the two networks engage in a competitive game, trying to outperform each other while helping each other accomplish their respective tasks. After thousands of iterations, if everything goes well, the generator network can perfectly generate realistic fake images, and the discriminator network can effectively judge whether an image is real or fake. In other words, the generator network transforms random noise vectors from latent space (not all GAN samples are from latent space) into samples from the real dataset. Training GANs is a very intuitive process.

GANs have numerous practical use cases, such as image generation, artwork generation, music generation, and video generation. Additionally, they can improve image quality, image stylization or coloring, facial generation, and many other interesting tasks.

The above image represents the general architecture of a GAN network. First, a D-dimensional noise vector is sampled from latent space and sent to the generator network. The generator network transforms this noise vector into an image. The generated image is then sent to the discriminator network for classification. The discriminator network continuously receives images from the real dataset and images generated by the generator network. Its job is to distinguish between real and fake images. All GAN architectures follow this design.

Adolescence

During its adolescence, GANs produced many popular architectures such as DCGAN, StyleGAN, BigGAN, StackGAN, Pix2pix, Age-cGAN, and CycleGAN. The results of these structures have been very satisfactory. Below we will discuss these GAN architectures in detail.

DCGAN

This was the first time convolutional neural networks were used in GANs, achieving very good results. Previously, CNNs had achieved unprecedented results in computer vision. However, CNNs had not been applied in GANs until then. Alec Radford, Luke Metz, Soumith Chintala, and others proposed DCGAN in “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.” This was a significant milestone in GAN research as it introduced an important architectural change to address issues such as training instability, mode collapse, and internal covariate shift. Since then, many GAN architectures have been based on DCGAN.

BigGAN

This represents the latest advancements in image generation within GANs. A Google intern and two researchers from Google DeepMind published a paper titled “Large Scale GAN Training for High Fidelity Natural Image Synthesis.” This paper is a collaboration between Andrew Brock from Heriot-Watt University and Jeff Donahue and Karen Simonyan from DeepMind.

These images are generated by BigGAN, and as you can see, the quality of the images is sufficient to be indistinguishable from real ones. This is the first time GANs have generated images with high fidelity and low variety gap. The previous highest initial score was 52.52, while BigGAN achieved an initial score of 166.3, which is 100% better than the state-of-the-art (SOTA). Furthermore, they improved the Fréchet Inception Distance (FID) score from 18.65 to 9.6. These are very impressive results. Its most important improvement is the orthogonal regularization of the generator.

StyleGAN

StyleGAN is another significant breakthrough in the field of GAN research. It was introduced by Nvidia in a paper titled “A Style-Based Generator Architecture for Generative Adversarial Networks.”

The Development History of Generative Adversarial Networks (GAN)

Source: https://medium.com/syncedreview/gan-2-0-nvidias-hyperrealistic-face-generator-e3439d33ebaf

StyleGAN set new records in facial generation tasks. The core of the algorithm is style transfer techniques or style mixing. In addition to generating faces, it can also generate high-quality images of cars, bedrooms, and more. This is another significant improvement in the field of GANs and an inspiration for deep learning researchers.

StackGAN

StackGANs were proposed by Han Zhang, Tao Xu, Hongsheng Li, and others in a paper titled “StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks.” They used StackGAN to explore text-to-image synthesis and achieved very good results. A StackGAN consists of a pair of networks that can generate realistic images when provided with a text description.

As seen in the above image, StackGAN generated realistic bird images when provided with a text description. Most importantly, the generated images closely resemble the provided text. Text-to-image synthesis has many practical applications, such as generating images from a text description, converting text-based stories into comics, and creating internal representations of text descriptions.

CycleGAN

CycleGAN has some very interesting use cases, such as converting photos to paintings, transforming summer photos to winter photos, or converting horse photos to zebra photos, and vice versa. CycleGANs were proposed by Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros in a paper titled “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.” CycleGAN is used for different image-to-image translations.

Pix2pix

For image-to-image translation tasks, Pix2pix has also shown impressive results. Whether converting night images to daytime images, coloring black-and-white images, or converting sketches into realistic photos, Pix2pix performs exceptionally well in these examples. The Pix2pix network was proposed by Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros in their paper titled “Image-to-Image Translation with Conditional Adversarial Networks.”

This is an interactive demonstration capable of generating realistic images from sketches.

Age-cGAN (Age Conditional Generative Adversarial Networks)

Facial aging has many industry use cases, including cross-age face recognition, finding missing children, or entertainment purposes. Grigory Antipov, Moez Baccouche, and Jean-Luc Dugelay proposed using conditional GANs for facial aging in their paper titled “Face Aging with Conditional Generative Adversarial Networks.”

The image shows how Age-cGAN transforms from the original age to the target age.

These are all very popular GAN architectures. Besides these, there are thousands of GAN architectures available. It depends on which architecture fits your needs.

Rise

As the famous theoretical physicist Richard Feynman said:

“What I can’t create, I don’t understand.”

The idea behind GANs is to train a network on known data. GANs begin to understand the data, and through this understanding, GANs start creating realistic images.

Edmond de Belamy

Edmond de Belamy, created by GAN, was sold for $432,500 at a Christie’s auction. This was an important step in the development of GANs, marking the first time the world witnessed GANs and their potential. Before this, GANs were primarily confined to research labs and utilized by machine learning engineers. This act made GANs an entry point for the public.

This Person Does Not Exist

You may be familiar with the website https://thispersondoesnotexist.com. It was created by Uber software engineer Philip Wan. He built this website using the code released by NVIDIA called StyleGAN. Every time you refresh, it generates a new non-existent face, making it hard to judge whether it is fake. This technology has the potential to create a completely virtual world.

It’s truly amazing!

Deep Fakes

DeepFakes is another alarming and destructive technology. Based on GANs, it can paste faces onto target individuals in videos. People find the drawbacks of this technology, but for AI researchers, it represents a significant breakthrough. This technology has the potential to save millions of dollars in the film industry, where hours of editing are required to change the face of a stunt double.

This technology is frightening, but we can also use it for the betterment of society.

Future Development

StyleGAN is currently the sixth most popular Python project on GitHub. The number of GANs proposed so far has reached thousands. This GitHub repository has a popular list of GANs and papers: https://github.com/hindupuravinash/the-gan-zoo

Present

GANs have been used to enhance game graphics. I am very excited about this use case of GANs. Recently, NVIDIA released a video showcasing how to gamify environments in videos using GANs.

Conclusion

In this article, we have seen how GANs have developed and grown into a global phenomenon. I hope that GANs achieve democratization in the coming years. In this article, we started with the birth of GANs, then learned about some popular GAN architectures, and finally observed the rise of GANs. When I see negative news about GANs, I feel a bit confused. I believe we have a responsibility to educate everyone about the impact of GANs and how we can use GANs ethically as much as possible.

You May Also Want to See:

● A Comprehensive Machine Learning Engineer Growth Roadmap on GitHub, Open Source Achieved 3700+ Stars in Two Days!

● Andrew Ng’s Latest TensorFlow Special Course Open for Registration; You’re Just One Step Away from Becoming a TF Boy

● Latest Results from the University of Toronto & NVIDIA: Image Annotation Speed Increased by 10 Times!

Scan to Follow: