Introduction to GAN: Understanding Generative Adversarial Networks

Table of Contents

What is GAN?
What Can GAN Do?
Framework and Training of GAN
Similarities and Differences Between GAN and Other Generative Models
Existing Issues with GAN Models

Introduction: GAN has gained significant popularity in the field of images over the past year, and there is a recent trend of it making inroads into the text domain. Besides the hype that comes with a new model, I am curious as to why it has been able to attract such attention since its proposal in 2014. Let’s take a closer look at GAN and unveil its mysterious veil! This article mainly introduces NIPS 2016 GAN tutorial by author Goodfellow, using simple language suitable for beginners.

1 What is GAN?

GAN stands for Generative Adversarial Network, which we directly translate as “Generative Adversarial Network”. In the framework of machine learning, GAN belongs to the category of generative models. However, the unique aspect of GAN lies in the word “adversarial”. In addition to the generative model, GAN also includes a discriminative model. The generative and discriminative models are like a pair of game players, where the generative model simulates the process of data generation to produce model samples, while the discriminative model needs to distinguish model samples from a mixture of real samples and model samples. Throughout the game, both players must compete against each other; the generative model needs to master imitation, while the discriminative model must develop a keen eye. This method of “adversarial generation” provides a feedback mechanism to the generation process through the discriminative model, allowing it to fit the generation process of real data more accurately compared to past generative models.

Extension: Generative models are generally used to describe the process of generating observed data, such as stock prices, weather changes, image and text creation, and information dissemination. To depict generative models more accurately, we often introduce many additional assumptions (latent variables). For example, we might assume that stock price changes are related to the current political environment and financial market, or that the process of text creation is related to its theme. The introduction of graphical models allows us to conveniently incorporate more assumptions and add more complex relationships between factors. Additionally, we can discard direct settings for each assumption and instead establish a correlation between a K-dimensional vector and the observations, fitting the observed data. Then, through the fitted K-dimensional vector, we can abstract the relationships between the data, leading to the emergence of “representation learning”.

2 What Can GAN Do?

To help everyone gain a clearer understanding of GAN, I will introduce some applications realized using GAN. Currently, most work related to GAN is concentrated in the image domain, and the following works have achieved better results using GAN:

1） Restoration of Ultra-High-Definition Images

Restoration of ultra-high-definition images can be used for image encoding and decoding and related applications. For example, there is a strong demand for this in online video. Figure 1 shows the restoration effect of GAN on ultra-high-definition images, demonstrating that GAN is currently one of the best models for restoring ultra-high-definition images.

Figure 1 Low-resolution image converted to ultra-high-definition image

2） Interactive Drawing

This is an interesting application. As shown in Figure 2, GAN draws corresponding images based on rough outlines drawn by humans (left). Interested viewers can use a VPN to visit YouTube for better demonstration effects. Here’s the link:

https://www.youtube.com/watch?v=9c4z6YsBGQ0.

Figure 2 Given outline (left) to create specific image (right)

3） Image Translation

This type of work is somewhat similar to the previous application, but with clearer and more direct applications, such as drawing maps from satellite images, street scene restoration, and texture filling. In this type of application, the performance of GAN is shown in Figure 3.

Figure 3 Image generated from given image via translation

4） Drawing by Text

Text can be seen as an abstract description of an image. Imagine the difficulty of drawing an image based on a textual description. If done by GAN, what would the generated image look like? Figure 4 shows the performance of GAN in drawing by text. It is evident that the results are quite impressive!

Figure 4 Drawing based on text

5） Representation Learning for Images

There is a logical operation: Man with glasses – Man without glasses + Woman without glasses = ? After some simple reasoning, we can deduce that the answer should be: Woman with glasses. How would a computer achieve this? (See Figure 5)

Figure 5 Representation learning for images

It can be seen that in these areas, GAN seems to have achieved good results. So, how does GAN actually work? Next, we will delve into GAN to understand its basic framework.

Welcome to follow the author’s WeChat public account

Leave a Comment Cancel reply