Understanding the Core Concepts of Generative Adversarial Networks (GANs)

The following content is sourced from Machine Learning Algorithms and Natural Language Processing, authored by Yi Zhen.

Currently, I am learning about GANs when I have some time, but I don’t have much time, so I will record what I have learned here. Don’t expect too much; this is entirely a note from studying Professor Li Hongyi’s course. As a beginner, I welcome everyone to communicate and point out mistakes.

Introduction

GANs have penetrated other fields of ML as a concept, creating many amazing things. They have been described by Yann LeCun as the most interesting idea of the past decade, so for students in related research directions, GANs have become a concept and method that must be learned.

Understanding the Core Concepts of Generative Adversarial Networks (GANs)

Basic Idea of GAN (Generator)

GAN consists of two fundamental components, one of which is the Generator. For the image generation process, you give it a vector, and it outputs an image.

Input: vector

Output: image

As shown below:

Understanding the Core Concepts of Generative Adversarial Networks (GANs)

For sentence generation, you give it a vector, and it outputs a sentence.

Input: vector

Output: sentence

As shown below:

We will carefully explain the process principles using the image generation process!

In fact, the Generator is a neural network. The input is a vector, and its output is a high-dimensional vector. For example, if the image is 16×16, then its output is a 256-dimensional vector.

As shown in the figure:

Each dimension of the input vector can represent certain features of the output image. For example, the value of the first dimension represents the length of the hair of the generated image.

When we increase the value of the first dimension of the input vector, we can see that the generated image’s hair becomes longer, as shown in the figure below:

For instance, if the value of a certain dimension of the input vector represents the depth of the hair color blue, increasing it results in the generated image’s hair appearing bluer:

Or if the value of a certain dimension of the input vector represents the size of the mouth opening, increasing it results in the generated image’s mouth appearing larger:

In summary, the Generator in GAN is a neural network that takes an input vector and outputs another vector.

The generated outputs vary across different tasks, and each dimension of the input vector represents specific features of the output image.

Basic Idea of GAN (Discriminator)

After discussing the Generator, let’s look at the other component of GAN, the Discriminator. The Discriminator is also a neural network. Its input is either the output from the Generator or a real image, and its output is a scalar representing the quality of the input, where a larger value indicates better quality and a smaller value indicates worse quality.

As shown in the figure below:

For example, if the input is a real image, then the output from the Discriminator should be a large scalar value, indicating that the quality of the input image is high.

If the input is a poor-quality image, then the output from the Discriminator should be a small scalar value, indicating that the quality of the input image is low.

Algorithm of GAN

Next, we will introduce the basic training algorithm of GAN, which may not be rigorous but is easy to understand.

First, like any network training, we need to initialize the parameters of the Generator G and the Discriminator D.

The formal equation is as follows:

1. In each round, first fix G and train D. How do we train it?

We randomly select some vectors to give to G while also picking some data from the database so that the Discriminator learns to score real images from the database high and scores low for images generated from randomly selected vectors from G. This is how we train the Discriminator:

The formal equation is as follows:

To explain the formula in the image briefly, training the Discriminator aims to score real images high and generated images low. The goal is to maximize the equation, which corresponds to the textual explanation:

Maximize the score for real images, as shown in the following equation:

Understanding the Core Concepts of Generative Adversarial Networks (GANs)

2. The second step is to fix the Discriminator D and train the Generator G. We still randomly give some vectors to G, which generates some images that are then fed into the Discriminator for evaluation.

Our goal is to enable the Generator to produce very realistic images. For truly real images, the Discriminator should score high, which means we need to train the Generator to generate images that the Discriminator scores high, effectively confusing the Discriminator, so the images generated by the Generator are close to “realistic”—in colloquial terms, they look very similar to real ones.

The formal equation is as follows:

The formula explains: maximize the score assigned by the Discriminator to the images generated by the Generator.

It is essential to note that when training the Generator, the Discriminator’s parameters must be fixed. In actual implementation, the Generator and Discriminator form a large network. If we do not fix the Discriminator’s parameters during the Generator’s training, the objective is to achieve a high final score, and at this point, updating only the last layer’s parameters can yield a very high scalar output. Clearly, this is not what we want. By fixing the last few layers of the Discriminator, we can train the parameters of the earlier layers of the Generator effectively.

Example

Finally, Professor Li Hongyi provided an example of generating anime avatars using GANs. Below are the generated results after training for different rounds, giving everyone an intuitive sense:

The Charm of GAN

Many people may wonder what the point is in generating impressive images. No matter how powerful the model is, it is still easier to take a photo with a camera. Here’s the explanation: if it can only generate images that have been seen before, it indeed lacks any remarkable features. However, if it can “reasonably” generate images that have never been seen before, isn’t that fascinating? As shown in the figure below, the image in the middle has never appeared in the training set:

It astonishingly learned characteristics such as head orientation during intermediate parameters, which is truly amazing.

Video material can be found at:

https://www.bilibili.com/video/av23316535from=search&seid=13825323076277645807

Leave a Comment Cancel reply