GAN Network Image Translator: Image Restoration and Enhancement

1 New Intelligence Column

Author: Liu Shiqiang

[New Intelligence Guide] This article introduces the application of deep learning methods in the field of image translation by implementing an encoding-decoding “image translator” for image enhancement, showcasing the effectiveness of deep learning applications in image translation.

In recent years, deep learning has achieved remarkable results in image processing, audio processing, and NLP fields, especially in image processing, where deep learning has become the mainstream method. This article introduces the application of deep learning methods in the field of image translation by implementing an encoding-decoding “image translator” for image enhancement, showcasing the effectiveness of deep learning applications in image translation. Furthermore, since neural networks can automatically perform feature engineering, the same model can adapt to different scenarios if trained with data from different contexts, truly achieving adaptability. In image processing scenarios with dual relationships, it is worth trying the “image translator” method, as the results should not be disappointing. Finally, we will introduce the latest advancements of the popular Generative Adversarial Networks (GAN) in image translation.

Image translation, similar to language translation, is the process of converting one image into another, such as image restoration, converting a 2D map into a 3D map, turning a blurry image into a clear one, or transforming a sketch into a color image, etc. Before the popularity of deep learning, image translation was considered a state-of-the-art task. For instance, in image restoration, filtering methods were commonly used, and different types of degraded images required different filtering schemes. Now, using deep learning methods, as long as there is enough training data, the approach is simple yet effective. The following examples can all be classified under image translation:

Here, let us construct a simple encoding-decoding “image translator” for enhancing blurry images. The project address is: https://github.com/lsqpku/img2img. Note: Some of the code in this project comes from the DCGAN project (https://github.com/Newmu/dcgan_code).

Building the “Image Translator”

The “image translator” constructed in this article is based on the idea of autoencoding neural networks. Autoencoding neural networks are a type of unsupervised learning algorithm that aims to make the output as close to the input as possible through the encoding-decoding process, thus can be viewed as an identity function. For more information on autoencoding neural networks, refer to Stanford University’s teaching materials at http://deeplearning.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity. We reassemble the discriminator and generator network structures from GAN, transforming the discriminator into an encoder and the generator into a decoder, thus establishing a simple image translation network. The network structure is as follows:

(As an example project, to reduce computation and overfitting, no fully connected bottleneck layer is added at the center of the network structure.) The encoder network structure’s TensorFlow code is as follows:

After the convolutional layer, a batch normalization layer is attached for regularization, and the activation function of the convolutional layer is leaky ReLU.

The generator network structure’s TensorFlow code is as follows:

After the convolutional layer, a batch normalization layer is attached for regularization, the activation function is ReLU, and finally, tanh activation is performed.

This structure is similar to autoencoding neural networks, except that when training this translation network, the input and output images belong to different domains. For example, when training a neural network for enhancing blurry images, the input training samples are blurry images, and the output is the enhanced clear images, which should be as close as possible to the corresponding clear images. Therefore, the training samples consist of blurry images paired with their corresponding clear images.

Illustration: Input blurry image, output clear image:

For the loss function, we simply used the L2 loss between the output image and the original image as the loss function.

self.loss = tf.nn.l2_loss(self.G – inputs)

Preparing Training and Testing Data

We used the Wiki face dataset (https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/) for training, removing black-and-white images from the original dataset and separating faces by gender. This project used male face data for training. Additionally, to obtain blurry images, we applied Gaussian blur, random Gaussian blur, and resizing (first reducing and then enlarging to the original size) to the original images, resulting in a large set of paired blurry-clear image training samples (over 20,000 images). Theoretically, we can use any image dataset for training and testing, but it is important to note the sample size of the training data. For the above model, when the number of images does not exceed 1000, it leads to significant overfitting; it is recommended that the training samples exceed tens of thousands. For testing data, we used male faces, female faces, and even buildings for testing.

Testing Results

The first image and the following images are the results of using this neural network to enhance male test samples. From the processing results, although not perfect, the clarity has greatly improved (from left to right: original image, blurred image, and image enhanced by the model).

How does the model trained on male faces perform on female test samples? The following image shows the processing results for female test samples (also subjected to Gaussian blur), which are similar to the male processing results (from left to right: original image, blurred image, and image enhanced by the model).

Let’s think outside the box; how does it perform on non-face samples? The following image shows the testing results for some non-face images (also subjected to Gaussian blur), and the results are also quite good, but the color tone changes from cool tones to facial tones (from left to right: original image, blurred image, and image enhanced by the model).

Turning the above image into grayscale shows a clearer enhancement effect (from left to right: original image, blurred image, and image enhanced by the model):

However, Gaussian blur is just one method of blurring. Does the model trained on Gaussian blurred images apply to other blurring methods (e.g., first resizing the original image to 1/16, then resizing back to the original size)? The following image shows the testing results for samples processed with the resize blur method (the left image is the result of enhancing the Gaussian blurred image, and the right image is the result of enhancing the resize blurred image).

It can be seen that the model trained on Gaussian blurred images performs worse on resize blurred images. However, this is explainable; deep learning is fundamentally a form of pattern recognition. When using Gaussian blurred training samples, the model will identify the pattern of Gaussian blur. To enable the model to also handle resize blurred images, we can use both types of samples as training data. Experiments show that the enhancement works well for both cases, which highlights the power of deep neural networks. By increasing the training samples and model scale, a single model can handle more complex situations.

Latest Advances in Image Translation

The above model is just a simple application of neural networks. Due to the model’s loss function being a simple L2-loss, it results in a blurring effect in images. To make images more realistic and reduce dependency on training sample quantity, some have used GAN for image translation. Here are a few notable cases:

1. pix2pix

Article: https://arxiv.org/abs/1611.07004　

Repo: Torch version https://github.com/phillipi/pix2pix;

TensorFlow version: https://github.com/affinelayer/pix2pix-tensorflow）　

This article’s innovation lies in two aspects: first, the generator’s loss function adds L1 loss in addition to distinguishing real from fake; another technique is that instead of judging the entire image, the judgment is made on patches of the image, which the authors call patchGAN. Comparative tests show that on face data, this model’s performance is not significantly different from the basic model, but on datasets like facades and cityscapes, the results appear more realistic (due to the small size of the facades and cityscapes datasets, our model experienced overfitting).

2. CycleGAN

Article: https://arxiv.org/pdf/1703.10593.pdf Repo: https://github.com/hardikbansal/CycleGAN

The basic model and pix2pix model require paired training samples, but in reality, it is often difficult to find a large number of such samples. The authors of CycleGAN proposed another GAN variant, mainly contributing to the use of unsupervised learning, requiring only two types of datasets without strict pairing (e.g., converting ordinary horses to zebras). The model is more complex (requiring two discriminators and two generators), and those interested can refer to https://hardikbansal.github.io/CycleGANBlog/

3. DualGAN

Article: https://arxiv.org/abs/1704.02510

This combines the idea of dual learning with GAN, similar to CycleGAN, also used to address the issue of insufficient training samples, utilizing two sets of GANs, where the images generated by one GAN serve as the input for the other GAN’s network structure.

Conclusion

In summary, using neural networks for image translation is simple and efficient. Combined with GAN networks, it allows for generating highly realistic images with fewer training samples. Everyone can try applying the concept of image translation in more image pairing scenarios to leverage the power of deep learning.

(The author of this article is Liu Shiqiang from the Data Information Center of Taikang Insurance Group)

[Breaking News] New Intelligence is conducting a new round of recruitment, flying to the most beautiful spaceship in the intelligent universe, with several seats available.

Click to read the original text for job details, looking forward to your joining~

Leave a Comment Cancel reply