Mathematical Principles of GAN

Mathematical Derivation of GAN

Author: Sherlock

Source: Machine Learning Algorithms and Natural Language Processing

Previously, we discussed the basic idea of GAN. Recently, I reviewed some GAN papers and happened to watch a course by Professor Li Hongyi, finding the mathematical derivation quite interesting, so I decided to write it down for future reference.

First, we need some preliminary knowledge: KL divergence, which is a concept in statistics that measures the similarity between two probability distributions. The smaller it is, the closer the two distributions are. For discrete probability distributions, it is defined as follows:

Mathematical Principles of GAN

For continuous probability distributions, it is defined as follows:

Mathematical Principles of GAN

According to what we discussed earlier, what we need to do is as shown in the following figure:

Mathematical Principles of GAN

We want to transform a random Gaussian noise z through a generator network G to obtain a generated distribution that is similar to the real data distribution Mathematical Principles of GAN.

The parameters Mathematical Principles of GAN are determined by the network parameters, and we hope to find θ∗ such that Mathematical Principles of GAN is as close as possible.

Mathematical Principles of GAN
Maximum Likelihood Estimation

We sample m points from the real data distribution Mathematical Principles of GAN and, based on the given parameters Mathematical Principles of GAN, we can compute the following probability Mathematical Principles of GAN. The likelihood of generating these m sample data is thus:

Mathematical Principles of GAN

What we want to do is find θ∗ to maximize this likelihood estimate.

Mathematical Principles of GAN

And how do we calculate PG(x;θ)?

Mathematical Principles of GAN

The I inside represents the indicator function, which is:

Mathematical Principles of GAN

Thus, we actually cannot compute this PG(x), which is the fundamental idea of generative models.

Mathematical Principles of GAN
Basic Idea of GAN
  • Generator G

    • G is a generator, given a prior distribution Pprior(z), we hope to obtain the generated distribution PG(x). This is difficult to achieve through maximum likelihood estimation.

  • Discriminator D

    • D is a function that measures the difference between PG(x) and Pdata(x), which replaces maximum likelihood estimation.

    First, we define the function V(G, D) as follows:

    Mathematical Principles of GAN

    We can derive the optimal generative model using the following expression:

    Mathematical Principles of GAN

    Doesn’t it feel confusing? Why does defining a V(G, D) and then taking max and min yield the optimal generative model?

    First, we only consider maxDV(G,D) and see what it means.

    Given G, we need to choose a suitable D such that V(G, D) can achieve its maximum value; this is simple calculus.

    Mathematical Principles of GAN

    For this integral, to maximize it, we want the term inside the integral to be maximized for given x, meaning we want to obtain an optimal D∗ that maximizes the following expression:

    Mathematical Principles of GAN

    Given the data and G, Pdata(x) and PG(x) can be seen as constants, which we can denote as a and b. Thus, we can derive the following expression:

    Mathematical Principles of GAN

    This allows us to determine D that maximizes V(D) given G, and substituting D back into the original V(G, D) yields the following result:

    Mathematical Principles of GAN

    At this point, we have already derived why this measurement is meaningful, as we take D to maximize V(G,D).

    The maximum value is composed of two KL divergences, meaning this maximum value measures the degree of difference between PG(x) and Pdata(x). Therefore, we take:

    Mathematical Principles of GAN

    This allows us to achieve G such that the difference between the two distributions is minimized, thus naturally generating a distribution that closely resembles the original distribution.

    At the same time, this also avoids the computation of maximum likelihood estimation, so GAN fundamentally changes the training process.

    Blog address: https://sherlockliao.github.io/2017/06/20/gan_math/

Halmus: How to Conduct Mathematical Research

Mark Zuckerberg’s 2017 Harvard Commencement Speech

Applications of Linear Algebra in Combinatorial Mathematics

Have You Seen the Real Phillips Curve?

The Story of Support Vector Machines (SVM)

Is the Mathematics in Deep Neural Networks Too Difficult for You?

How Much Mathematics Do You Need to Know for Programming?

Chen Shengshen – What is Geometry?

A Review and Outlook on Pattern Recognition Research

The Theory of Surfaces

What is the Significance of the Natural Base e?

How to Explain Support Vector Machines (SVM) to a 5-Year-Old?

Chinese-American Genius Mathematician Terence Tao’s Autobiography

Algebra, Analysis, Geometry, and Topology: The Three Major Methodologies of Modern Mathematics

Mathematical Principles of GAN

We welcome submissions to the WeChat public account: The Beauty of Mathematical Algorithms

Submissions should involve mathematics, physics, algorithms, computer science, programming, and related fields.

Upon acceptance, we will offer remuneration.

Submission email: [email protected]

Leave a Comment