Development Series of GANs: PGGAN and SinGAN

In the previous article, we introduced the basics of GANs (Generative Adversarial Networks) and some series of GANs. In the following series, we will continue to introduce some classic GANs.

Introduction to GANs

Development Series of GANs: CGAN, DCGAN, WGAN, WGAN-GP, LSGAN, BEGAN

1. PGGAN: Progressive Growing GAN Paper: Progressive Growing of GANs for Improved Quality, Stability, and Variation Code: tkarras/progressive_growing_of_gans

When generating images at high resolutions, the discriminator can easily identify that the images produced by the generator are fake, making it difficult for the generator to train. The previous articles on DCGAN and WGAN could only generate images of 64×64 pixels, leading to significant detail loss at larger sizes. PGGAN introduces a new training method that starts with generating images at 4×4 resolution and progresses to generating 1024×1024 facial images, gradually allowing both the generator and discriminator to grow. It starts with low-resolution images, adding new layers progressively to make the network model increasingly complex to learn better detailed features. This method not only speeds up training but also stabilizes it.The paper created a higher quality version of the CELEBA dataset, allowing for output resolutions of up to 1024 × 1024 pixels.The progressive method of PGGAN enables training to first discover the distribution of images at large-scale structures, then gradually shift focus to better scale details, rather than having to learn all scales simultaneously.Moreover, the competition between the generator and the discriminator promotes the training of both, and most iterations are completed at lower resolutions, which reduces training time.

Here, 4×4 refers to operating on images of the corresponding size. The generator first generates an image of size 4×4, where ‘Reals’ refers to processed 4×4 facial images. The structure of the discriminator is symmetrical to that of the generator. Training stops after inputting 800k real images, and parameters are saved. Then the next layer is added, as the output channels of the generator may not necessarily be 3, so a toRGB operation is needed to convert it to RGB three channels, using a 1×1 convolution kernel for convolution operations, while fromRGB is the opposite. To prevent the newly added layers from having a significant impact on the original network, new layers are smoothly added when doubling the resolution of the generator and discriminator. The following figure explains how to convert from 16 × 16 pixel images to 32 × 32 pixel images.

The detailed network structure is as follows:

The main advantage of PGGAN is its ability to generate high-quality samples.

PS: I have introduced the progressive growing training method into the UNet model for some preliminary experiments. Interested friends can click:

Paper: PGU-net+: Progressive Growing of U-net+ for Automated Cervical Nuclei Segmentation

Paper Address: https://arxiv.org/abs/1911.01062

Code Address: https://github.com/Minerva-J/PGU-net-Model

2. SinGAN Paper: SinGAN: Learning a Generative Model from a Single Natural Image Code: https://github.com/tamarott/SinGAN More Demonstrations: https://youtu.be/xk8bWLZk4DU

SinGAN trains on only one image, which serves as both the training and testing sample. SinGAN is an unconditional (based on random noise) generative adversarial model that uses a multi-scale pyramid structure of fully convolutional GANs to extract internal distribution information from the image, generating samples with the same visual content, high quality, and variability. Each GAN is responsible for capturing different scales of image distribution. SinGAN can be applied to various image processing tasks, such as image painting, editing, blending, super-resolution reconstruction, and animation.

To capture global properties such as the arrangement of target shapes and positions in the image (e.g., sky at the top, ground at the bottom), as well as fine details and texture information, SinGAN includes hierarchical patch-GANs (Markov discriminators), where each discriminator is responsible for capturing distributions at different scales.

The loss function consists of adversarial loss and reconstruction loss, where adversarial loss is the competition between the generator and the discriminator.

The reconstruction loss is

Generated Images: 1. Multi-scale generated zebra from the N-th scale during testing has many legs. However, starting from the N-1 scale, the generated samples become very realistic, and the generated large tree is similar, retaining more details.

2. Multi-scale during training can capture global structures when the scales are richer.

Leave a Comment Cancel reply