Understanding GAN Networks

Introduction

GAN, short for Generative Adversarial Networks, is a type of generative model. Personally, I like to call it the “involution” network. Why do I say this? Let’s start with a story!!!

The Story of Cops and Robbers

On a distant planet in the universe, there is a city that is emerging, with various systems still under construction, leading to chaotic public security. Soon, many thieves appeared in this city. Of course, these thieves have varying levels of skills; some are expert burglars, while others are just bumbling novices.

As theft became rampant, citizens complained, and the city began to crack down on crime, launching a campaign to improve public security. The police started patrolling, and soon many unskilled thieves were caught.

Of course, the police could only catch the less skilled thieves because their own abilities were also limited. At this point, the public security situation was uncertain, but after catching the unskilled thieves, the overall skill level of the thieves in the city significantly improved.

At this point, the police chief ordered further training in investigative techniques to capture the more cunning thieves. Gradually, professional criminals were caught one by one; the police’s ability to quickly identify suspicious individuals improved significantly.

The thieves, facing this phenomenon, began to realize that the police’s skill had greatly improved and they could no longer act sneakily as before, or they would easily get caught. The thieves began to practice their skills diligently. Finally, they moved closer to their goal of becoming master thieves. At this point, it became difficult for the police to catch them again.

The police chief was particularly angry and called a meeting of the officers, urging them to train rigorously. Thus, the officers worked day and night, and eventually, their hard work paid off. Many thieves were caught.

As the saying goes, “As the law rises, so does the devil.” The hardworking police and the diligent thieves are in a constant struggle, ultimately reaching a Nash equilibrium, leading to a city filled with both “captors” and “masters of theft”.

GAN is similar to the police and thieves in the story, engaging in mutual involution and confrontation.

Simple Understanding of GAN

Face detection, image recognition, and speech recognition—machines always make descriptions and judgments based on existing things. Can we create something that does not exist in this world? GAN exists for this purpose. It consists of three parts—generation, discrimination, and adversarial training; among them, the generator and discriminator are the key modules.

The generator and discriminator refer to two independent modules. The generator is responsible for producing content based on random vectors, which can be images, text, or music, depending on what you want to create. The discriminator is responsible for determining whether the received content is real, usually providing a probability that represents the authenticity of the content. There are no specific requirements for the type of network used by both; typical networks for image processing, such as CNN and common fully connected networks, can be used as long as they fulfill their respective functions.

Next, we move on to adversarial training; adversarial training refers to the alternating training process of GAN. Taking image generation as an example, the generator first generates some fake and real images, which are then given to the discriminator for evaluation, allowing it to learn to distinguish between the two, giving real images high scores and fake images low scores. Once the discriminator becomes proficient in judging existing data, the generator aims to achieve high scores from the discriminator, continuously generating better fake images until it can deceive the discriminator. This process repeats until the discriminator’s prediction probability for any image approaches 0.5, meaning it can no longer distinguish between real and fake images, at which point training can stop.

The generator and discriminator are like the police and thieves mentioned earlier; they are in opposition yet also resemble friends. Initially, both are unskilled and nameless, but through constant competition, involution, and upgrades, they ultimately grow together, becoming captors and masters of theft.

The ultimate goal of training a GAN is to obtain a sufficiently good generator that can produce content that is nearly indistinguishable from real content. Other models that can perform similar functions include Boltzmann Machines, Variational Autoencoders, etc., which are all classified as generative models.

Summary of GAN

(1) Generator and Discriminator

Generator (Generator): Generates data through machines (these contents can be images, text, or music) with the goal of “deceiving” the discriminator.

Discriminator (Discriminator): Determines whether the contents are real or machine-generated, aiming to identify the “fake data” produced by the generator.

(2) Training Steps

Step 1: Fix the discriminator and train the generator;

Step 2: Fix the generator and train the discriminator;

Step 3: Achieve Nash equilibrium.

(3) Sample Image Generation Process

Step 1: The generator inputs a random noise image A;

Step 2: The generator’s convolutional neural network extracts edge features of the hub defect and generates a sample image;

Step 3: The discriminator evaluates the real and generated samples; if the discrimination probability is 0.5, the sample is output; otherwise, training continues.

Step 4: Expand the sample library.

If you want to learn more about AI, feel free to add the author’s WeChat (13142159848) to discuss artificial intelligence together!!!

Leave a Comment Cancel reply