Understanding GANs Through Boxing

Selected from KDnuggets

Translated by Machine Heart

Author:Michael Dietz

Contributors: Jane W, Yan Qi, Wu Pan

Generative Adversarial Networks (GANs) have gained significant attention in the research community recently. In this article, Michael Dietz, founder of Waya.ai, explains why GANs hold such potential and illustrates how GANs work through a vivid comparison with boxing matches.

Generative Adversarial Networks (GANs) consist of two independent networks: the generator and the discriminator. GANs treat the unsupervised learning problem as a game between these two networks. In this article, we will explore why GANs have such great potential and explain them through a comparison with boxing matches.

There is no difference between Generative Adversarial Networks and boxing matches.

The Principles Behind Deep Learning

Deep learning is inspired by biology, so many of its core concepts are intuitive and grounded in reality. The fundamental principle of deep learning is a hierarchical architecture—hierarchy refers not only to the layers in the network but also to the learned representations built upon each other. In fact, our real world is the same: electrons, protons, neutrons → atoms → molecules → … It is logical to model a hierarchical world in a hierarchical manner, which is why deep learning can successfully address very difficult problems using simple, elegant, and universal approaches.

Visualizing the hierarchical structure and representations learned by deep convolutional neural networks.

Incentivizing Unsupervised Learning

“Adversarial training is the coolest thing ever.”

— Yann LeCun, Head of AI Research at Facebook, Professor at NYU

Now let’s apply this biological inspiration to current training methods for networks. Supervised learning is the fundamental method in machine learning—each data sample requires a true annotation/label during training. However, most learning in the real world is accomplished through unsupervised learning. Just think about how we learn to walk, talk, etc.… While supervised learning performs well on many tasks, unsupervised learning seems to be the key to true artificial intelligence.

Accurate data labeling is often impractical. Ideally, unsupervised model training can be performed on unlabeled data, followed by fine-tuning with a sufficiently small labeled dataset. Returning to the hierarchical perspective, it should be possible to train AI to understand the fundamental building blocks of the world and then develop on top of the existing knowledge base, followed by fine-tuning for specific use cases in a more supervised manner.

Unsupervised Learning—A Specific Example

We build a convolutional neural network by training on millions of unlabeled skin images. Some of these images show healthy skin, while others show diseased skin, and some are in between. Ultimately, the neural network will gain a deep understanding of skin and its complexities through learning. Once the network is built, it can be used to handle specific instances (like accurately diagnosing skin cancer in real-time).

Since the model has learned general, effective representations of the most important information contained in skin images, it should be able to quickly learn new tasks, like diagnosing skin cancer, using only a smaller labeled dataset compared to training with only supervised methods. This is the essence of transfer learning and fine-tuning.

GANs are one of the most promising research areas in unsupervised learning, and we will see that they are a simple and effective way to learn representations from big data.

Understanding GANs

Let’s break down the basic components of GANs:

Data: Mathematically, we consider a dataset as samples drawn from a real data distribution. This data can be images, speech, sensor readings, etc.
Generator: Takes some code (i.e., random noise) as input, transforms it, and outputs a data sample. The goal of the generator is to ultimately output different data samples that conform to the real data distribution.
Discriminator: Takes data samples as input and classifies them as real (from the real data distribution) or fake (from the generator). The purpose of the discriminator is to accurately distinguish between real and generated images.

The overall goal of a standard GAN is to train a generator that can produce different data samples that conform to the real data distribution, such that the discriminator has only a 50% chance of classifying images as real or generated. During the training process of this network, both the generator and discriminator learn powerful hierarchical representations of the source data, which can then be transferred to various specific tasks (such as classification, segmentation, etc.).

Understanding the GAN Training Process

The pseudocode below may seem confusing, so we will follow up with a simple example of a real adversarial learning process.

while equilibrium_not_reached:
    
    # train the discriminator to classify a batch of images from our
    # dataset as real and a batch of images generated by our current
    # generator as fake
 
    1.)
    discriminator.train_on_batch(image_batch=real_image_batch, labels=real)
    2.)
    discriminator.train_on_batch(image_batch=generated_image_batch, labels=fake)
    # train the generator to trick the discriminator into
    # classifying a batch of generated images as real. The key here
    # is that the discriminator is frozen (not trainable) in this
    # step, but its loss functions gradients are back-propagated
    # through the combined network to the generator
    # the generator updates its weights in the most ideal way
    # possible based on these gradients
    
    3.)
    combined.train_on_batch(input=batch_of_noise, labels=real)
    # where combined is a model that consists of the generator and
    # discriminator joined together such that: input =&gt; generator =&gt;
    # generator_output =&gt; discriminator =&gt; classification

Whether we realize it or not, we are very familiar with the general concepts of GANs and adversarial learning. For example, imagine learning to play a song on the guitar:

1. Listen to the song—figure out how to map the song onto the guitar (Step 1 in the training process above).

2. Try playing the song—listen to what you’re playing and note the differences from the actual song (Step 2).

3. Play the song again—attempt to resolve those differences (Step 3).

We add some variations and repeat this process, merging Steps 2 and 3, while storing parts of the result from Step 1 in memory and revisiting it when the stored results (memory) need improvement, until we can happily play something close enough to the real song.

As you become a more skilled guitarist, you can learn new songs with just a little practice, even if you’ve never heard or played them before (i.e., transfer learning/fine-tuning).

In this example, the song is the data, our ears/brain are the discriminator, and our hands/brain are the generator. This may resemble how we learn to act, speak, etc.… Further, imagine a deaf-mute trying to speak—this sounds interesting because they lack a discriminator to facilitate adversarial learning (perhaps they can choose other cues, like people’s reactions, as a weak discriminator).

Now that we have established some intuitive understanding of GANs, let’s see how to implement them in software. We need to consider the similarities and differences between GANs in reality and in software. One example of a difference is that the adversarial learning process occurring in reality appears collaborative between the generator and discriminator, while the software implementation of GANs appears adversarial (… just like a boxing match).

Training GANs—A Boxing Match Between Generator and Discriminator

Creed is the discriminator, Rocky is the generator. Ding… Ding… Let’s fight!

At first glance, the discriminator seems like the coach, and the generator is the boxer. But in fact, they are both boxers, while the real data is actually the coach. The only difference is that only the discriminator has direct access to the data.

The discriminator learns from its coach (the larger the real dataset, the more experienced the coach), while the generator learns only from its opponent (the discriminator).

In Step 1 of the training process above, the discriminator is trained by its coach through heavy punching bags. The coach points out its technical shortcomings and encourages the discriminator to adapt. In Step 2, the discriminator observes the generator boxing for a round, studying the generator and preparing for its upcoming round of boxing.

Leaked boxing footage means the opponent has more material to learn and prepare.

Now, Step 3, the boxing match! The generator is a tough boxer from Philadelphia, relaxing and focusing while boxing, studying every move and mistake of the discriminator and adapting after each round. The discriminator hates boxing; it gets nervous and scared whenever it learns nothing at all. The discriminator may have more athletic talent than the generator (classifying data as real/fake is easier than actually generating data), but the generator’s mindset helps it in the match. Even without a coach (no access to the real dataset), the generator learns a lot from the discriminator because it absorbs the fundamental features taught by its coach.

Continuing this process over several rounds until eventually both the discriminator and generator become well-rounded boxers and are ready for the match. The coach has taught all the important details of the match it knows, and the generator and discriminator have learned a lot from each other in their boxing match. Ideally, they should be equal at the end of training, with a 50/50 chance of winning against each other.

Challenges

As you delve deeper into GANs, you will see one of the main difficulties we currently face: training these networks to converge well—we want the generator and discriminator to reach the balance we desire, but this is often not achieved. For information on what errors can occur, refer to this URL, where you can find a lot of information and related research: https://www.quora.com/Do-generative-adversarial-networks-always-converge. The following URL provides increasing information on how to address these issues: https://github.com/soumith/ganhacks.

The following highlights several of the most common failures of GANs:

1. The discriminator becomes too strong and fast, causing the generator to learn nothing by the time training ends. In our boxing analogy, this is like the discriminator becoming so strong that the generator is completely outmatched. Because the discriminator (relative to the generator) makes no mistakes and leaves no room for the generator to compete, the generator cannot learn anything. Theoretically, this means that in Step 3 above, the discriminator is so accurate and confident in classifying the generated data as fake that there is nothing in the discriminator’s back-propagated loss function gradients.

2. The generator only learns very specific weaknesses of the discriminator and exploits those weaknesses to trick the discriminator into classifying the data as real, rather than learning to depict the real data distribution. The theoretical explanation can be found here: http://www.kdnuggets.com/2015/07/deep-learning-adversarial-examples-misconceptions.html. In our boxing analogy, this is like the generator only learning about very limited weaknesses of the discriminator and exploiting those weaknesses instead of learning the fundamentals and techniques of boxing. Against an opponent without the same weaknesses, the generator would become useless! Additionally, anything the discriminator learned from the generator would also become useless because the discriminator’s opponent in a real match would not behave as ineptly as the generator.

3. The generator only learns a very small subset of the real data distribution. In our boxing analogy, this is like our generator only learning to throw punches and dodge—without developing any other tools and techniques. This would lead the discriminator to learn very little from the generator and cause the discriminator to overly represent this small subset of the data distribution. A practical example is that the generator produces the same data sample for every possible input, and its output data shows no variation.

The above analogy is an ongoing experiment, and we will add more relevant information in the future.

Conclusion

Now that we have a foundational understanding of GANs, let’s re-examine their goal: to learn strong representations from unlabeled data (for example: obtaining our data from raw data, learning to represent its most important features within a much smaller range → achieving ideal performance with fewer labeled data).

After training a GAN, most current methods use the discriminator as a fundamental model for transfer learning and fine-tuning production models, or use the generator as a data source for training production models. In our boxing analogy, this means the discriminator has received his boxing license while the competing generator has not. Unfortunately, it appears that the generator has the potential to become a better boxer. It will either be fired or only serve as a sparring partner for a production model.

What I cannot create, I do not understand.

A well-trained generator learns the real data distribution well, enabling it to generate its samples from a much smaller input range. This means it has developed extremely powerful data representation capabilities. It would be great to directly utilize what the generator has learned in production models, but currently, there seems to be no method to achieve this. If there is, please comment and let us know.

For a clear and simple implementation of standard GANs (and other types of GANs, such as InfoGAN and ACGAN), refer to:

GAN Sandbox: Vanilla GAN implementation based on Keras/TensorFlow for quick experimentation and research: https://github.com/wayaai/GAN-Sandbox

Here are some classes of GANs that can generate extremely valuable generators, although they are still “sparring partners”:

SimGAN: A game-changer in unsupervised learning and autonomous driving: https://medium.com/intuitionmachine/simgans-applied-to-autonomous-driving-5a8c6676e36b

Original link: http://www.kdnuggets.com/2017/03/deep-learning-gans-boxing-fundamental-understanding.html

This article is translated by Machine Heart, please contact this public account for authorization.

✄————————————————

Join Machine Heart (Full-time Reporter/Intern): [email protected]

Submissions or inquiries: [email protected]

Advertising & Business Cooperation: [email protected]

Leave a Comment Cancel reply