Understanding GANs for Kids: A Simple Guide

Understanding GANs for Kids: A Simple Guide

The full text has 6327 words,55 images.
Estimated reading time 32 minutes.

This article is the eighteenth in the “Kids Can Understand” series. The series featuresshort content that can be read in fragmented time, but the effort I put into it is substantial. If you like it, that’s enough!

  1. Neural Networks That Kids Can Understand

  2. Recommendation Systems That Kids Can Understand

  3. Incremental Learning That Kids Can Understand

  4. Clustering That Kids Can Understand

  5. Principal Component Analysis That Kids Can Understand

  6. Recurrent Neural Networks That Kids Can Understand

  7. Embedding That Kids Can Understand

  8. Entropy, Cross Entropy, and KL Divergence That Kids Can Understand

  9. p-value That Kids Can Understand

  10. Hypothesis Testing That Kids Can Understand

  11. Gini Impurity That Kids Can Understand

  12. ROC That Kids Can Understand

  13. SVD That Kids Can Understand

  14. SVD 2 That Kids Can Understand

  15. GMM That Kids Can Understand

  16. Beta Distribution That Kids Can Understand

  17. Multi-Armed Bandit That Kids Can Understand

  18. GAN That Kids Can Understand

0

What is GAN

The full name of GAN is Generative Adversarial Network, which in Chinese is 生成对抗网络.

Understanding GANs for Kids: A Simple Guide

In short, GAN consists of two neural networks: the generator and the discriminator, which continuously compete with each other, with the generator producing increasingly realistic outputs and the discriminator’s recognition ability becoming more powerful.

2

Counterfeiting and Appraisal

Understanding GANs for Kids: A Simple Guide

The relationship between the generator and the discriminator is similar to that of the counterfeiter and the appraiser.

  • The counterfeiter continuously produces counterfeit goods with the aim of deceiving the appraiser, and in this process, their counterfeiting skills improve.

  • The appraiser continuously examines the counterfeits with the goal of identifying the counterfeiter, and in this process, their appraisal skills improve.

GAN is both the counterfeiter and the appraiser, but ultimately, it is still the counterfeiter. The ultimate goal of GAN is to train a “perfect” counterfeiter, which can produce outputs that confuse even the appraiser.

A picture is worth a thousand words; the following image shows how the counterfeiter gradually generates a realistic Mona Lisa painting and ultimately deceives the appraiser.

Understanding GANs for Kids: A Simple Guide

In this process, whenever the counterfeiter generates an image, the appraiser provides feedback, and the counterfeiter learns how to improve to create a realistic image.

3

Counterfeit Appraisal Network?

Returning to neural networks, the counterfeiter uses the generator for modeling, while the appraiser uses the discriminator for modeling.

Understanding GANs for Kids: A Simple Guide

According to the above animation, the discriminator’s task is to distinguish which images are real and which images are produced by the generator.

Next, we will create a minimal GAN using Python.

First, let’s set a story background.

4

Story Background

On Slanted Island, everyone is slanted, probably about 45 degrees to the left.

Understanding GANs for Kids: A Simple Guide

The island owner wants to create a face generator, and since the facial features of the people on the island are very simple, they use 2 * 2 pixel blurry face images.

Understanding GANs for Kids: A Simple Guide

Due to technical limitations, the island owner only used a single-layer neural network.

Understanding GANs for Kids: A Simple Guide

However, even in this extremely simple setup, a single-layer GAN can still generate “slanted faces”.

5

Distinguishing Faces

The following image shows what four faces look like.

Understanding GANs for Kids: A Simple Guide

Representing faces with 2*2 pixels, dark colors indicate the presence of a face, while light colors indicate the absence of a face.

Understanding GANs for Kids: A Simple Guide

If it’s not a face? Then the elements in its 2*2 pixel image are random, as shown below.

Understanding GANs for Kids: A Simple Guide

Let’s review:

  • Face: Dark on the diagonal, light off the diagonal

  • Non-Face: Any area can be dark or light

Understanding GANs for Kids: A Simple Guide

Pixels can be represented by values from 0 to 1:

  • Face: Large values on the diagonal, small values off the diagonal

  • Non-Face: Any value between 0-1 can be anywhere

Understanding GANs for Kids: A Simple Guide

Having understood how facial and non-facial images are represented by different characteristic 2*2 value matrices, let’s look at how to construct the discriminator and generator in the next two sections.

First, let’s analyze the discriminator.

6

Discriminator

The discriminator is used to identify faces, so how does it distinguish when it sees the pixel values of a photo?

Understanding GANs for Kids: A Simple Guide

Simple! We have already analyzed it in the previous section:

  • Face: Large values on the diagonal, small values off the diagonal

  • Non-Face: Any value between 0-1 can be anywhere

Understanding GANs for Kids: A Simple Guide

What operation should be used to represent faces and non-faces with a single value? It’s simple, as shown in the figure below: add the element at position (1,1), subtract the element at (1,2), subtract the element at (2,1), and add the element at (2,2) to get a single value.

Understanding GANs for Kids: A Simple Guide

The score for a face is 2 (higher), while the score for a non-face is -0.5 (lower).

Understanding GANs for Kids: A Simple Guide

Set a threshold of 1; scores greater than 1 indicate a face, while scores less than 1 indicate a non-face.

Using the above content represented in a neural network results in a minimal discriminator. Note that besides the “add-subtract-subtract-add” of the four matrix elements, a bias is also added to get the final score.

Understanding GANs for Kids: A Simple Guide

The discriminator ultimately needs to determine whether it is a face, so the output is a probability that needs to be converted from the score of 1 using the sigmoid function to a probability of 0.73. Given a probability threshold of 0.5, since 0.73 > 0.5, the discriminator judges that the image is a face.

Understanding GANs for Kids: A Simple Guide

For another non-face image, using the same operation, the final score is calculated as -0.5. After applying the sigmoid function, given a probability threshold of 0.5, since 0.37 < 0.5, the discriminator judges that the image is a face.

Understanding GANs for Kids: A Simple Guide

7

Generator

The discriminator aims to identify faces, while the generator aims to generate faces. So, what kind of matrix pixels resemble a face? Simple! The rules have been analyzed multiple times:

  • Face: Large values on the diagonal, small values off the diagonal

  • Non-Face: Any value between 0-1 can be anywhere

Understanding GANs for Kids: A Simple Guide

Now let’s look at the generation process. The first step is to randomly select a number between 0-1, for example, 0.7.

Understanding GANs for Kids: A Simple Guide

Recall that the generator’s goal is to generate faces, meaning that the final 2*2 matrix must have large pixel values on the diagonal (indicated by thick lines) and small pixel values off the diagonal (indicated by thin lines).

Understanding GANs for Kids: A Simple Guide

For example, generating the value at matrix position (1,1), with w = 1, b = 1, the calculation gives wz + b = 1.7.

Understanding GANs for Kids: A Simple Guide

Similarly, calculate the scores for the other three positions in the matrix.

Understanding GANs for Kids: A Simple Guide

Finally, apply the sigmoid function to convert the scores, ensuring the pixel values are between 0-1.

Understanding GANs for Kids: A Simple Guide

Note that by giving weights [1, -1, -1, 1] and a bias of 1, since z is always a positive number between 0 and 1, such a neural network (the generator) can always generate a 2*2 pixel matrix that resembles a face.

From the previous and this section, we now know what kind of discriminator can identify faces and what kind of generator can generate good faces, meaning what kind of GAN is a good GAN. These are determined by weights and biases; next, let’s see how they are trained. First, let’s review the error function.

8

Error Function

Typically, positive classes are represented by 1 and negative classes by 0. In this case, faces are positive, represented by 1; non-faces are negative, represented by 0.

When the label is 1 (face), -ln(x) serves as a good error function because

  • When the prediction is inaccurate (predicting a non-face, say 0.1), the error should be large, -ln(0.1) is large.

  • When the prediction is accurate (predicting a face, say 0.9), the error should be small, -ln(0.9) is small.

Understanding GANs for Kids: A Simple Guide

When the label is 0 (non-face), -ln(1-x) serves as a good error function.

  • When the prediction is accurate (predicting a non-face, say 0.1), the error should be small, -ln(1-0.1) is large.

  • When the prediction is inaccurate (predicting a face, say 0.9), the error should be large, -ln(1-0.9) is small.

Understanding GANs for Kids: A Simple Guide

Understanding GANs for Kids: A Simple Guide

Next is the game between GAN, where the generator and discriminator are put together to see what happens.

9

Putting the Generator and Discriminator Together

Let’s review the structure of both:

  • Generator: Input is a random number between 0-1, output is a pixel matrix of the image

  • Discriminator: Input is a pixel matrix of the image, output is a probability value

Understanding GANs for Kids: A Simple Guide

The following animation shows the process from generator to discriminator.

Understanding GANs for Kids: A Simple Guide

Since the image is generated from the generator and is not a real image, a good discriminator will judge that this is not a face, thus using the error function corresponding to the label of 0, -ln(1-prediction).

Understanding GANs for Kids: A Simple Guide

Conversely, a good generator aims to deceive the discriminator, i.e., it wants the discriminator to judge that this is a face, so it uses the error function corresponding to the label of 1, -ln(prediction).

Understanding GANs for Kids: A Simple Guide

Here comes the interesting part: let G represent the generator and D represent the

, then

  • G(z) is the output of the generator, i.e., the pixel matrix, which is also the input of the discriminator

  • D(G(z)) is the output of the discriminator, i.e., the probability, which is also the prediction in the error function above

To make both the generator and discriminator stronger, we want to minimize the error function

-ln(D(G(z)) – ln(1-D(G(z))

where D(G(z)) is the prediction of the discriminator.

Understanding GANs for Kids: A Simple Guide

Comparing the error function we obtained with the objective function in the GAN paper (shown below), we find some differences:

Understanding GANs for Kids: A Simple Guide

Explanation as follows:

The discriminator not only receives images produced by the generator G(z), but also receives real images x. In this case, a good discriminator will judge that this is a face, thus using the error function corresponding to the label of 1, -ln(-prediction). Therefore, for the discriminator, the error function to minimize is

-ln(D(x)) – ln(1-D(G(z))

Removing the negative sign is equivalent to maximizing

ln(D(x)) + ln(1-D(G(z))

This is exactly V(D,G), right? This process fixes the generator to optimize the to distinguish fake images.

After maximizing V(D, G), while fixing the , we optimize the to generate images that are indistinguishable from real ones. However, isn’t the generator’s error function -ln(D(G(z))? How can it relate to V(D, G)? In fact, -ln(D(G(z)) is equivalent to ln(1-D(G(z)) at this point, which is the second term of V(D, G), while the first term ln(D(x)) is a constant for G, so it doesn’t matter whether it’s included or not.

Finally, both terms in V(D, G) have expectation symbols, and in actual optimization, we achieve this through statistical averages over n samples. The x in the first term’s expectation comes from the real data distribution p_data(x), and the z in the first term’s expectation comes from a specific probability distribution p_z(z).

In summary, first maximize V(D,G) through D, then minimize V(D, G) through G.

10

Training GAN

During training, when the face comes from the generator, the discriminator outputs a probability value close to 0 by minimizing the error function.

Understanding GANs for Kids: A Simple Guide

When the face comes from a real image, the discriminator outputs a probability value close to 1 by minimizing the error function.

Understanding GANs for Kids: A Simple Guide

Of course, all neural network training algorithms are based on gradient descent.

OK, the following content is indeed not suitable for ordinary kids, but kids with a strong interest in mathematics and programming can continue reading Understanding GANs for Kids: A Simple Guide.

11

Mathematical Derivation

Discriminator: From pixel matrix to probability

Understanding GANs for Kids: A Simple Guide

Understanding GANs for Kids: A Simple Guide

Generator: From random number z to pixel matrix

Understanding GANs for Kids: A Simple Guide

Understanding GANs for Kids: A Simple Guide

After obtaining the partial derivatives of the error function with respect to the weights and biases in the generator and discriminator, we can write the code to implement it.

12

Python Implementation – Preparation

Import numpy and matplotlib.

import numpy as npfrom numpy import randomfrom matplotlib import pyplot as plt%matplotlib inline

Write a function to draw facial pixels.

def view_samples(samples, m, n):    fig, axes = plt.subplots(figsize=(10, 10),                              nrows=m, ncols=n,                              sharey=True, sharex=True)    for ax, img in zip(axes.flatten(), samples):        ax.xaxis.set_visible(False)        ax.yaxis.set_visible(False)        im = ax.imshow(1-img.reshape((2,2)), cmap='Greys_r')      return fig, axes

Draw four faces, noting that the pixel matrix has large values on the diagonal and small values off the diagonal.

faces = [np.array([1,0,0,1]),         np.array([0.9,0.1,0.2,0.8]),         np.array([0.9,0.2,0.1,0.8]),         np.array([0.8,0.1,0.2,0.9]),         np.array([0.8,0.2,0.1,0.9])]    _ = view_samples(faces, 1, 4)

Understanding GANs for Kids: A Simple Guide

Draw twenty non-faces, noting that the pixel matrix elements are all random.

noise = [np.random.randn(2,2) for i in range(20)]def generate_random_image():    return [np.random.random(), np.random.random(), np.random.random(), np.random.random()]_ = view_samples(noise, 4,5)

Understanding GANs for Kids: A Simple Guide

13

Python Implementation – Building the Discriminator

First, implement the sigmoid function.

def sigmoid(x):    return np.exp(x)/(1.0+np.exp(x))

Using object-oriented programming (OOP) to write the discriminator, the code is as follows:

Understanding GANs for Kids: A Simple Guide

Where

  • __init__() is the constructor

  • forward() function flattens the pixel matrix into a vector x, multiplies it by weight w, adds bias b to get a score, and then converts it to probability using the sigmoid() function

  • error_form_image() calculates the error function when receiving real images as input

  • error_form_noise() calculates the error function when receiving the generator as input

  • derivatives_form_image() calculates the partial derivatives of the error function with respect to weights w and bias b when receiving real images as input

  • derivatives_form_noise() calculates the partial derivatives of the error function with respect to weights w and bias b when receiving the generator as input

  • update_form_image() calculates the gradient descent method when receiving real images as input

  • update_form_noise() calculates the gradient descent method when receiving the generator as input

14

Python Implementation – Building the Generator

Using object-oriented programming (OOP) to write the generator, the code is as follows:

Understanding GANs for Kids: A Simple Guide

Where

  • __init__() is the constructor

  • forward() function multiplies the random number z by weight w, adds bias b to get a score, and then converts it to pixel using the sigmoid() function

  • error() calculates the error function when fixing the discriminator as input, in two steps:

    • The generator’s forward() function gets the pixels

    • The discriminator’s forward() function gets the probability

  • derivatives() calculates the partial derivatives of the error function with respect to weights w and bias b when fixing the discriminator as input, referring to the mathematical formulas in the previous section

  • update() calculates the gradient descent method when fixing the discriminator as input

15

Python Implementation – Training GAN

Set 1000 epochs, meaning the data will be traversed 1000 times to start training, recording the errors of both the generator and discriminator for each epoch.

Understanding GANs for Kids: A Simple Guide

Plot the error function graph for the generator and discriminator, finding that the generator gradually stabilizes.

plt.plot(errors_generator)plt.title("Generator error function")plt.legend("gen")plt.show()plt.plot(errors_discriminator)plt.legend('disc')plt.title("Discriminator error function")

Understanding GANs for Kids: A Simple Guide

Understanding GANs for Kids: A Simple Guide

16

Python Implementation – Result Display

Generate images.

generated_images = []for i in range(4):    z = random.random()    generated_image = G.forward(z)    generated_images.append(generated_image)_ = view_samples(generated_images, 1, 4)for i in generated_images:    print(i
[0.94688171 0.03401213 0.04080795 0.96308679] [0.95653992 0.03437852 0.03579494 0.97063836] [0.95056667 0.03414339 0.03893305 0.96599501] [0.94228203 0.03386046 0.04309146 0.95941292]

Understanding GANs for Kids: A Simple Guide

Print the final parameters of the GAN, i.e., the weights and biases of the generator and discriminator.

print("Generator weights", G.weights)print("Generator biases", G.biases)print("Discriminator weights", D.weights)print("Discriminator bias", D.bias)
Generator weights [ 0.70702123 0.03720449 -0.45703394 0.79375751] Generator biases [ 2.48490157 -3.36725912 -2.90139211 2.8172726 ] Discriminator weights [ 0.60175083 -0.29127513 -0.40093314 0.37759987] Discriminator bias -0.8955103005797729

Here is the GAN with weights and biases shown.

Understanding GANs for Kids: A Simple Guide

The bold lines in the figure correspond to large weights, while the thin lines correspond to small or negative weights. Comparing with the earlier goal of the generator to generate realistic faces (i.e., large values on the diagonal of the 2*2 matrix), isn’t this weight reasonable?

Friends, have you understood GANs?

If you want to learn Python content, you can refer to my “Three Sets of Python Premium Courses”.

Understanding GANs for Kids: A Simple Guide

Understanding GANs for Kids: A Simple Guide

Understanding GANs for Kids: A Simple Guide

Leave a Comment