Mathematical Principles of GAN

Mathematical Derivation of GAN

Author: Sherlock

Source: Machine Learning Algorithms and Natural Language Processing

Previously, we discussed the basic idea of GAN. Recently, I reviewed some GAN papers and happened to watch a course by Professor Li Hongyi, finding the mathematical derivation quite interesting, so I decided to write it down for future reference.

First, we need some preliminary knowledge: KL divergence, which is a concept in statistics that measures the similarity between two probability distributions. The smaller it is, the closer the two distributions are. For discrete probability distributions, it is defined as follows:

For continuous probability distributions, it is defined as follows:

Mathematical Principles of GAN

According to what we discussed earlier, what we need to do is as shown in the following figure:

Mathematical Principles of GAN

We want to transform a random Gaussian noise z through a generator network G to obtain a generated distribution that is similar to the real data distribution Mathematical Principles of GAN .

The parameters Mathematical Principles of GAN are determined by the network parameters, and we hope to find θ∗ such that is as close as possible.

Maximum Likelihood Estimation

We sample m points from the real data distribution Mathematical Principles of GAN and, based on the given parameters , we can compute the following probability . The likelihood of generating these m sample data is thus:

Mathematical Principles of GAN

What we want to do is find θ∗ to maximize this likelihood estimate.

Mathematical Principles of GAN

And how do we calculate PG(x;θ)?

Mathematical Principles of GAN

The I inside represents the indicator function, which is:

Mathematical Principles of GAN

Thus, we actually cannot compute this PG(x), which is the fundamental idea of generative models.

Basic Idea of GAN

Generator G

G is a generator, given a prior distribution Pprior(z), we hope to obtain the generated distribution PG(x). This is difficult to achieve through maximum likelihood estimation.

Discriminator D

D is a function that measures the difference between PG(x) and Pdata(x), which replaces maximum likelihood estimation.

First, we define the function V(G, D) as follows:

Mathematical Principles of GAN

We can derive the optimal generative model using the following expression:

Mathematical Principles of GAN

Doesn’t it feel confusing? Why does defining a V(G, D) and then taking max and min yield the optimal generative model?

First, we only consider maxDV(G,D) and see what it means.

Given G, we need to choose a suitable D such that V(G, D) can achieve its maximum value; this is simple calculus.

Mathematical Principles of GAN

For this integral, to maximize it, we want the term inside the integral to be maximized for given x, meaning we want to obtain an optimal D∗ that maximizes the following expression:

Mathematical Principles of GAN

Given the data and G, Pdata(x) and PG(x) can be seen as constants, which we can denote as a and b. Thus, we can derive the following expression:

Mathematical Principles of GAN

This allows us to determine D that maximizes V(D) given G, and substituting D back into the original V(G, D) yields the following result:

Mathematical Principles of GAN

At this point, we have already derived why this measurement is meaningful, as we take D to maximize V(G,D).

The maximum value is composed of two KL divergences, meaning this maximum value measures the degree of difference between PG(x) and Pdata(x). Therefore, we take:

Mathematical Principles of GAN

This allows us to achieve G such that the difference between the two distributions is minimized, thus naturally generating a distribution that closely resembles the original distribution.

At the same time, this also avoids the computation of maximum likelihood estimation, so GAN fundamentally changes the training process.

Blog address: https://sherlockliao.github.io/2017/06/20/gan_math/

☞ Halmus: How to Conduct Mathematical Research

☞ Mark Zuckerberg’s 2017 Harvard Commencement Speech

☞ Applications of Linear Algebra in Combinatorial Mathematics

☞ Have You Seen the Real Phillips Curve?

☞ The Story of Support Vector Machines (SVM)

☞ Is the Mathematics in Deep Neural Networks Too Difficult for You?

☞ How Much Mathematics Do You Need to Know for Programming?

☞ Chen Shengshen – What is Geometry?

☞ A Review and Outlook on Pattern Recognition Research

☞ The Theory of Surfaces

☞ What is the Significance of the Natural Base e?

☞ How to Explain Support Vector Machines (SVM) to a 5-Year-Old?

☞ Chinese-American Genius Mathematician Terence Tao’s Autobiography

☞ Algebra, Analysis, Geometry, and Topology: The Three Major Methodologies of Modern Mathematics

We welcome submissions to the WeChat public account: The Beauty of Mathematical Algorithms

Submissions should involve mathematics, physics, algorithms, computer science, programming, and related fields.

Upon acceptance, we will offer remuneration.

Submission email: [email protected]

Leave a Comment Cancel reply