Understanding GANs for Kids: A Simple Guide

The full text has 6327 words,55 images.

Estimated reading time 32 minutes.

This article is the eighteenth in the “Kids Can Understand” series. The series featuresshort content that can be read in fragmented time, but the effort I put into it is substantial. If you like it, that’s enough!

Neural Networks That Kids Can Understand
Recommendation Systems That Kids Can Understand
Incremental Learning That Kids Can Understand
Clustering That Kids Can Understand
Principal Component Analysis That Kids Can Understand
Recurrent Neural Networks That Kids Can Understand
Embedding That Kids Can Understand
Entropy, Cross Entropy, and KL Divergence That Kids Can Understand
p-value That Kids Can Understand
Hypothesis Testing That Kids Can Understand
Gini Impurity That Kids Can Understand
ROC That Kids Can Understand
SVD That Kids Can Understand
SVD 2 That Kids Can Understand
GMM That Kids Can Understand
Beta Distribution That Kids Can Understand
Multi-Armed Bandit That Kids Can Understand
GAN That Kids Can Understand

What is GAN

The full name of GAN is Generative Adversarial Network, which in Chinese is 生成对抗网络.

In short, GAN consists of two neural networks: the generator and the discriminator, which continuously compete with each other, with the generator producing increasingly realistic outputs and the discriminator’s recognition ability becoming more powerful.

Counterfeiting and Appraisal

The relationship between the generator and the discriminator is similar to that of the counterfeiter and the appraiser.

The counterfeiter continuously produces counterfeit goods with the aim of deceiving the appraiser, and in this process, their counterfeiting skills improve.
The appraiser continuously examines the counterfeits with the goal of identifying the counterfeiter, and in this process, their appraisal skills improve.

GAN is both the counterfeiter and the appraiser, but ultimately, it is still the counterfeiter. The ultimate goal of GAN is to train a “perfect” counterfeiter, which can produce outputs that confuse even the appraiser.

A picture is worth a thousand words; the following image shows how the counterfeiter gradually generates a realistic Mona Lisa painting and ultimately deceives the appraiser.

In this process, whenever the counterfeiter generates an image, the appraiser provides feedback, and the counterfeiter learns how to improve to create a realistic image.

Counterfeit Appraisal Network?

Returning to neural networks, the counterfeiter uses the generator for modeling, while the appraiser uses the discriminator for modeling.

According to the above animation, the discriminator’s task is to distinguish which images are real and which images are produced by the generator.

Next, we will create a minimal GAN using Python.

First, let’s set a story background.

Story Background

On Slanted Island, everyone is slanted, probably about 45 degrees to the left.

The island owner wants to create a face generator, and since the facial features of the people on the island are very simple, they use 2 * 2 pixel blurry face images.

Due to technical limitations, the island owner only used a single-layer neural network.

However, even in this extremely simple setup, a single-layer GAN can still generate “slanted faces”.

Distinguishing Faces

The following image shows what four faces look like.

Representing faces with 2*2 pixels, dark colors indicate the presence of a face, while light colors indicate the absence of a face.

If it’s not a face? Then the elements in its 2*2 pixel image are random, as shown below.

Let’s review:

Face: Dark on the diagonal, light off the diagonal
Non-Face: Any area can be dark or light

Pixels can be represented by values from 0 to 1:

Face: Large values on the diagonal, small values off the diagonal
Non-Face: Any value between 0-1 can be anywhere

Having understood how facial and non-facial images are represented by different characteristic 2*2 value matrices, let’s look at how to construct the discriminator and generator in the next two sections.

First, let’s analyze the discriminator.

Discriminator

The discriminator is used to identify faces, so how does it distinguish when it sees the pixel values of a photo?

Simple! We have already analyzed it in the previous section:

Face: Large values on the diagonal, small values off the diagonal
Non-Face: Any value between 0-1 can be anywhere

What operation should be used to represent faces and non-faces with a single value? It’s simple, as shown in the figure below: add the element at position (1,1), subtract the element at (1,2), subtract the element at (2,1), and add the element at (2,2) to get a single value.

The score for a face is 2 (higher), while the score for a non-face is -0.5 (lower).

Set a threshold of 1; scores greater than 1 indicate a face, while scores less than 1 indicate a non-face.

Using the above content represented in a neural network results in a minimal discriminator. Note that besides the “add-subtract-subtract-add” of the four matrix elements, a bias is also added to get the final score.

The discriminator ultimately needs to determine whether it is a face, so the output is a probability that needs to be converted from the score of 1 using the sigmoid function to a probability of 0.73. Given a probability threshold of 0.5, since 0.73 > 0.5, the discriminator judges that the image is a face.

For another non-face image, using the same operation, the final score is calculated as -0.5. After applying the sigmoid function, given a probability threshold of 0.5, since 0.37 < 0.5, the discriminator judges that the image is a face.

Generator

The discriminator aims to identify faces, while the generator aims to generate faces. So, what kind of matrix pixels resemble a face? Simple! The rules have been analyzed multiple times:

Face: Large values on the diagonal, small values off the diagonal
Non-Face: Any value between 0-1 can be anywhere

Now let’s look at the generation process. The first step is to randomly select a number between 0-1, for example, 0.7.

Recall that the generator’s goal is to generate faces, meaning that the final 2*2 matrix must have large pixel values on the diagonal (indicated by thick lines) and small pixel values off the diagonal (indicated by thin lines).

For example, generating the value at matrix position (1,1), with w = 1, b = 1, the calculation gives wz + b = 1.7.

Similarly, calculate the scores for the other three positions in the matrix.

Finally, apply the sigmoid function to convert the scores, ensuring the pixel values are between 0-1.

Note that by giving weights [1, -1, -1, 1] and a bias of 1, since z is always a positive number between 0 and 1, such a neural network (the generator) can always generate a 2*2 pixel matrix that resembles a face.

From the previous and this section, we now know what kind of discriminator can identify faces and what kind of generator can generate good faces, meaning what kind of GAN is a good GAN. These are determined by weights and biases; next, let’s see how they are trained. First, let’s review the error function.

Error Function

Typically, positive classes are represented by 1 and negative classes by 0. In this case, faces are positive, represented by 1; non-faces are negative, represented by 0.

When the label is 1 (face), -ln(x) serves as a good error function because

When the prediction is inaccurate (predicting a non-face, say 0.1), the error should be large, -ln(0.1) is large.
When the prediction is accurate (predicting a face, say 0.9), the error should be small, -ln(0.9) is small.

When the label is 0 (non-face), -ln(1-x) serves as a good error function.

When the prediction is accurate (predicting a non-face, say 0.1), the error should be small, -ln(1-0.1) is large.
When the prediction is inaccurate (predicting a face, say 0.9), the error should be large, -ln(1-0.9) is small.

Next is the game between GAN, where the generator and discriminator are put together to see what happens.

Putting the Generator and Discriminator Together

Let’s review the structure of both:

Generator: Input is a random number between 0-1, output is a pixel matrix of the image
Discriminator: Input is a pixel matrix of the image, output is a probability value

The following animation shows the process from generator to discriminator.

Since the image is generated from the generator and is not a real image, a good discriminator will judge that this is not a face, thus using the error function corresponding to the label of 0, -ln(1-prediction).

Conversely, a good generator aims to deceive the discriminator, i.e., it wants the discriminator to judge that this is a face, so it uses the error function corresponding to the label of 1, -ln(prediction).

Here comes the interesting part: let G represent the generator and D represent the

, then

G(z) is the output of the generator, i.e., the pixel matrix, which is also the input of the discriminator
D(G(z)) is the output of the discriminator, i.e., the probability, which is also the prediction in the error function above

To make both the generator and discriminator stronger, we want to minimize the error function

-ln(D(G(z)) – ln(1-D(G(z))

where D(G(z)) is the prediction of the discriminator.

Comparing the error function we obtained with the objective function in the GAN paper (shown below), we find some differences:

Explanation as follows:

The discriminator not only receives images produced by the generator G(z), but also receives real images x. In this case, a good discriminator will judge that this is a face, thus using the error function corresponding to the label of 1, -ln(-prediction). Therefore, for the discriminator, the error function to minimize is

-ln(D(x)) – ln(1-D(G(z))

Removing the negative sign is equivalent to maximizing

ln(D(x)) + ln(1-D(G(z))

This is exactly V(D,G), right? This process fixes the generator to optimize the to distinguish fake images.

After maximizing V(D, G), while fixing the , we optimize the to generate images that are indistinguishable from real ones. However, isn’t the generator’s error function -ln(D(G(z))? How can it relate to V(D, G)? In fact, -ln(D(G(z)) is equivalent to ln(1-D(G(z)) at this point, which is the second term of V(D, G), while the first term ln(D(x)) is a constant for G, so it doesn’t matter whether it’s included or not.

Finally, both terms in V(D, G) have expectation symbols, and in actual optimization, we achieve this through statistical averages over n samples. The x in the first term’s expectation comes from the real data distribution p_data(x), and the z in the first term’s expectation comes from a specific probability distribution p_z(z).

In summary, first maximize V(D,G) through D, then minimize V(D, G) through G.

Training GAN

During training, when the face comes from the generator, the discriminator outputs a probability value close to 0 by minimizing the error function.

When the face comes from a real image, the discriminator outputs a probability value close to 1 by minimizing the error function.

Of course, all neural network training algorithms are based on gradient descent.

OK, the following content is indeed not suitable for ordinary kids, but kids with a strong interest in mathematics and programming can continue reading .

Mathematical Derivation

Discriminator: From pixel matrix to probability

Generator: From random number z to pixel matrix

After obtaining the partial derivatives of the error function with respect to the weights and biases in the generator and discriminator, we can write the code to implement it.

Python Implementation – Preparation

Import numpy and matplotlib.

import numpy as npfrom numpy import randomfrom matplotlib import pyplot as plt%matplotlib inline

Write a function to draw facial pixels.

def view_samples(samples, m, n):    fig, axes = plt.subplots(figsize=(10, 10),                              nrows=m, ncols=n,                              sharey=True, sharex=True)    for ax, img in zip(axes.flatten(), samples):        ax.xaxis.set_visible(False)        ax.yaxis.set_visible(False)        im = ax.imshow(1-img.reshape((2,2)), cmap='Greys_r')      return fig, axes

Draw four faces, noting that the pixel matrix has large values on the diagonal and small values off the diagonal.

faces = [np.array([1,0,0,1]),         np.array([0.9,0.1,0.2,0.8]),         np.array([0.9,0.2,0.1,0.8]),         np.array([0.8,0.1,0.2,0.9]),         np.array([0.8,0.2,0.1,0.9])]    _ = view_samples(faces, 1, 4)

Draw twenty non-faces, noting that the pixel matrix elements are all random.

noise = [np.random.randn(2,2) for i in range(20)]def generate_random_image():    return [np.random.random(), np.random.random(), np.random.random(), np.random.random()]_ = view_samples(noise, 4,5)

Python Implementation – Building the Discriminator

First, implement the sigmoid function.

def sigmoid(x):    return np.exp(x)/(1.0+np.exp(x))

Using object-oriented programming (OOP) to write the discriminator, the code is as follows:

Where

__init__() is the constructor
forward() function flattens the pixel matrix into a vector x, multiplies it by weight w, adds bias b to get a score, and then converts it to probability using the sigmoid() function
error_form_image() calculates the error function when receiving real images as input
error_form_noise() calculates the error function when receiving the generator as input
derivatives_form_image() calculates the partial derivatives of the error function with respect to weights w and bias b when receiving real images as input
derivatives_form_noise() calculates the partial derivatives of the error function with respect to weights w and bias b when receiving the generator as input
update_form_image() calculates the gradient descent method when receiving real images as input
update_form_noise() calculates the gradient descent method when receiving the generator as input

Python Implementation – Building the Generator

Using object-oriented programming (OOP) to write the generator, the code is as follows:

Where

__init__() is the constructor
forward() function multiplies the random number z by weight w, adds bias b to get a score, and then converts it to pixel using the sigmoid() function
error() calculates the error function when fixing the discriminator as input, in two steps:

The generator’s forward() function gets the pixels
The discriminator’s forward() function gets the probability

derivatives() calculates the partial derivatives of the error function with respect to weights w and bias b when fixing the discriminator as input, referring to the mathematical formulas in the previous section
update() calculates the gradient descent method when fixing the discriminator as input

Python Implementation – Training GAN

Set 1000 epochs, meaning the data will be traversed 1000 times to start training, recording the errors of both the generator and discriminator for each epoch.

Plot the error function graph for the generator and discriminator, finding that the generator gradually stabilizes.

plt.plot(errors_generator)plt.title("Generator error function")plt.legend("gen")plt.show()plt.plot(errors_discriminator)plt.legend('disc')plt.title("Discriminator error function")

Python Implementation – Result Display

Generate images.

generated_images = []for i in range(4):    z = random.random()    generated_image = G.forward(z)    generated_images.append(generated_image)_ = view_samples(generated_images, 1, 4)for i in generated_images:    print(i

[0.94688171 0.03401213 0.04080795 0.96308679] [0.95653992 0.03437852 0.03579494 0.97063836] [0.95056667 0.03414339 0.03893305 0.96599501] [0.94228203 0.03386046 0.04309146 0.95941292]

Print the final parameters of the GAN, i.e., the weights and biases of the generator and discriminator.

print("Generator weights", G.weights)print("Generator biases", G.biases)print("Discriminator weights", D.weights)print("Discriminator bias", D.bias)

Generator weights [ 0.70702123 0.03720449 -0.45703394 0.79375751] Generator biases [ 2.48490157 -3.36725912 -2.90139211 2.8172726 ] Discriminator weights [ 0.60175083 -0.29127513 -0.40093314 0.37759987] Discriminator bias -0.8955103005797729

Here is the GAN with weights and biases shown.

The bold lines in the figure correspond to large weights, while the thin lines correspond to small or negative weights. Comparing with the earlier goal of the generator to generate realistic faces (i.e., large values on the diagonal of the 2*2 matrix), isn’t this weight reasonable?

Friends, have you understood GANs?

If you want to learn Python content, you can refer to my “Three Sets of Python Premium Courses”.

Understanding GANs for Kids: A Simple Guide

Leave a Comment Cancel reply