The Magician Who Creates Data From Nothing: Generative Adversarial Networks

Authors Li Jing Liu Wen Gao Shenghua

Although the wise Yu Gong knows that “sons have sons, and grandsons have grandsons,” he certainly could not have imagined what his descendants would look like. When you are moved by the speaker’s wonderful speech in a video, have you ever thought that this is just a synthetic work? When an outstanding actor unfortunately passes away, how people wish for his image to return to the screen. Today, the latest artificial intelligence technology is making all of this possible.

In August 2021, media reported that during an online summit held by global graphics card giant NVIDIA in April, the speaker was not CEO Jensen Huang himself, but a “fake person” synthesized through digital technology. The news immediately attracted attention, and although it was later confirmed that the digital person only appeared for 14 seconds instead of the entire event, it was enough to demonstrate that the current development of generative technology has made it difficult to distinguish between real and fake. While such news was unexpected, it was actually within reason. With the rapid development of generative technology, similar events may soon become commonplace. These lifelike images are generated using a technique called Generative Adversarial Networks (GAN).

The Turing Award in 2018 was jointly awarded to artificial intelligence experts and deep learning “three giants” Geoffrey Hinton, Yann LeCun, and Yoshua Bengio. Among them, Bengio is a professor at the University of Montreal in Canada and the founder of the Montreal Institute for Learning Algorithms. One of his significant contributions is the research on GANs. There is also a very interesting story about the invention of GAN. In 2014, one of the inventors of GAN, Ian Goodfellow, was still a doctoral student under Professor Bengio, researching generative models. One day while drinking with friends at a bar, he discussed generative models and suddenly thought of the idea of GAN. He then told his friend how it should be done and bet that he could create it, but his friend was skeptical. So Goodfellow left the bar and conducted experiments, writing the paper on GAN in one night, which was later published at a top artificial intelligence conference. Now, GANs are widely used for generating images and videos, automatically generating text, and even developing new drugs.

Facial Images Can you tell which of these images are real faces and which are computer-generated? In fact, they are all generated using Generative Adversarial Network technology.

Principle of Generative Adversarial Networks

Before introducing GAN, let’s first explain what a generative model is. Everyone may have this curiosity: as Yu Gong said, “sons have sons, and grandsons have grandsons, the descendants are endless,” what will these future people look like? In fact, the various types of data we have observed, such as images of human faces, satisfy a certain data distribution in a high-dimensional data space. We usually refer to a data point as a sample. If we can fit the distribution of the real data we observe, such as the distribution of facial data, using the observed sample points, then those unseen faces can be sampled from this estimated data distribution. This is the essence of generative models.

GAN is a special type of generative model, consisting of two parts: one part is called the generator, and the other part is called the discriminator. The input to the generator is a random noise sampled from a certain prior distribution, which can be seen as a kind of encoding of a sample. The output of the generator follows the same distribution as the observed data (usually referred to as training data). The role of the discriminator is to distinguish whether the input sample is a fake sample produced by the generator or a real sample from the training data. For image generation applications, both the generator and discriminator are typically implemented using convolutional neural networks.

Illustration of the Principle of Generative Adversarial Networks

The classic optimization objective function of GAN [1] is as follows:

Wherex is the observed training sample, whose distribution satisfiespdata.z is the noise sampled from the prior distribution, whose distribution satisfiesp_z(z)D(x) andD(G(z)) represent the probabilities that the discriminator classifies the real sample and the generated sample as real samples, respectively.The goal of the generator is to make the generated samples indistinguishable from real samples by the discriminator, while the goal of the discriminator is to accurately classify whether the samples are generated or real.When the generator and discriminator reach a stable state through adversarial training, the generator can produce a very realistic sample from random noise, thus achieving the goal of sample generation.Since random noise is infinite, countless realistic samples can be generated.

Current Status of Generative Adversarial Networks

After Goodfellow and Bengio proposed the relevant concepts, GANs developed rapidly. The original GAN could only sample from noise and could not generate specific categories of images well. Some proposed that controllable conditional image generation could be achieved by inputting category labels [2]. Additionally, during GAN training, the issue of mode collapse often occurs, where all noise becomes one or a few images after passing through the generator, resulting in a lack of diversity in the generated images. In response, some proposed changing the loss function of GAN to the Wasserstein distance, which improved the issues of mode collapse and instability during training, enhancing the diversity of generated samples [3].The images generated by the original GAN were often blurry. To improve the quality of generated images, some proposed using generators with upsampling capabilities and downsampling convolutional layers instead of fully connected layers in the original GAN [4], and controlling image generation through style [5], generating high-resolution, high-quality images by modulating the mean and variance of adaptive instance normalization layers. To achieve image translation, that is, mapping images from one style to another, some realized image translation with paired data through conditional GANs [6], while others achieved image translation without paired image data by translating images from the source domain to the target domain and then back to the source domain to maintain consistency [7]. Furthermore, to address the training issues of GANs with insufficient training samples, some studied image generation problems in few-shot scenarios [8,9]. To make the generation process more controllable, some works have studied the interpretability of GANs. Currently, GANs have shown good results in generating images of faces, vehicles, and natural scenes, but there is still much work to be done for generating images of complex scenes and videos.

Applications of Generative Adversarial Networks

As GAN technology matures, it has been applied in various aspects of our lives. For example, in the field of digital humans, GANs can be used for generating faces, editing facial attributes, image completion, and human action transfer; in the text domain, GANs can be used for automatic generation of news and ancient poetry; in the pharmaceutical field, GANs can also be used for new drug development.

Digital Humans GAN can generate faces that do not exist in the real world, a potential application is that film companies can use GAN to create their own digital human IP, similar to Donald Duck and Mickey Mouse, becoming a cultural symbol. At the same time, GAN technology can also be used for editing facial attributes: inputting a face and using GAN to edit the attributes of that face, such as adding a smile, blonde hair, or aging effects, which can conveniently beautify facial photos and perform post-processing. Additionally, GAN can transfer the target character according to the expected actions, achieving character driving. Furthermore, GAN can also perform photo restoration. By organically combining these technologies, it is foreseeable that future film works based on GAN technology will emerge.

Image Completion

Facial Attribute Editing

Human Action Transfer

Generation of Images with Specific Styles Generative Adversarial Networks can also achieve image translation and generation of images with specific styles. A real image can be translated into an oil painting in the style of different artists through GAN, allowing artists like Van Gogh and Monet, who can no longer paint, to “come back to life” and continue creating beautiful works of art. A portrait titled “Edmond de Belamy”, featuring an 18th-century gentleman, was generated using GAN by an art collective called “Obvious” in Paris. The signature in the lower right corner of the painting is GAN’s objective function. This artwork was sold for $432,500 (approximately 3 million RMB) by a mysterious buyer at Christie’s auction house on October 25, 2018.

Image Translation

The World’s First AI-Created Painting Auctioned: “Edmond de Belamy”

Generation of Sequential Data The generation of sequential signals, including text generation, music generation, and speech generation, has a wide range of applications in practice. AI-based sequential generation has achieved a series of successes in recent years. For example, Microsoft’s Xiaobing generated the first poetry collection in human history written by AI—”The Sun Lost the Glass Window”. This poetry collection was published by Zhanlu Culture in 2017. During the 2016 Rio Olympics, ByteDance launched an AI robot called Zhang Xiaoming. This robot used GAN technology to generate news, writing real-time news articles related to table tennis, tennis, badminton, and women’s soccer events by interfacing with the Olympic Committee’s database and publishing them almost simultaneously with TV broadcasts. According to statistics, Zhang Xiaoming published 456 Olympic news articles within 16 days.

In 2019, at the re:Invent annual technology conference, Amazon AWS released the world’s first GAN-based music keyboard, AWS DeepComposer, which helps users easily create their own musical works. Additionally, the Google Brain team also proposed GANsynth (audio synthesis based on adversarial neural networks) for the rapid generation of high-quality music in the same year. J. Engel, a researcher from Google Brain, commented on the performance of GANsynth: “It can generate instrument audio 50,000 times faster than standard WaveNet (a traditional music generation algorithm) with higher quality (both in quantitative tests and listener tests), and can independently control pitch and timbre, making the insertion between instruments smoother.” Moreover, by combining GAN and deep reinforcement learning technology, near-realistic effects have been achieved in text generation and ancient Chinese poetry generation.

New Drug Generation On average, traditional pharmaceutical companies screen 8,000 drug molecules each time, with only one eventually coming to market. This process requires researchers to spend weeks or even months in the laboratory testing each drug molecule one by one. In recent years, thanks to rapidly growing computational power, GAN technology, and deep reinforcement learning technology, researchers have begun to explore the use of AI technology for drug development and screening, achieving significant progress. For instance, in 2019, Insilico Medicine used GAN and deep reinforcement learning to conceive and design new drug molecular structures, successfully synthesizing and testing a major candidate drug in mice, with AI-based molecular design taking only 21 days. With design, synthesis, and validation, the entire process took only 46 days. It is expected that GAN technology has the potential to save enormous research costs for the entire pharmaceutical industry.

The Future and Challenges of Generative Adversarial Networks

Although GANs have developed rapidly and can gradually synthesize eye-catching images and videos from random noise or under specific user specifications (such as scene segmentation and layout), current algorithms still cannot perform fine-grained modeling and control of complex scenes. This also leads to traditional renderers being unable to perform complex, non-local three-dimensional interactions when the quality of the materials is low. In contrast, neural rendering is expected to combine the advantages of modern computer graphics and deep generative models, completing controllable, high-definition image (or video) synthesis and editing using images or videos as input. It is foreseeable that the combination of GAN and neural rendering technology will play an increasingly important role in the film industry, virtual/augmented reality, and smart cities, becoming a key technology for building the digital metaverse.

As the cost of synthesis gradually decreases, deepfake technology, represented by Deepfake, is becoming more widespread, and its penetration into various aspects of social life is deepening. From the public’s perspective, the current attitude towards Deepfake is trending towards two extremes: one is falling into the whirlpool of Deepfake entertainment, and the other is standing against Deepfake. As a powerful tool for weaving false information, the presence of Deepfake is beginning to permeate many corners of the internet, infringing on privacy, disrupting political elections, smearing public figures, and causing the proliferation of non-consensual pornography, gradually eroding public trust and triggering a crisis of social trust. Technology itself is neutral, but its applications can be good or evil; future research needs to participate more actively in efforts to prevent the misuse of Deepfake. The most effective method currently is to develop corresponding detection technologies to counter technology with technology. However, conducting detection on Deepfake is not easy and still faces considerable challenges. Therefore, many governments around the world are highly concerned about the risks of misuse of deepfake technology and have begun to formulate relevant laws and regulations to restrict its spread in cyberspace. On June 12, 2019, the U.S. Congress proposed the Deepfakes Accountability Act, which requires anyone creating synthetic media files to use “indelible digital watermarks and text descriptions” to indicate that the media file has been altered or generated; otherwise, it will be considered a criminal act. In the same year, China’s National Internet Information Office, the Ministry of Culture and Tourism, and the National Radio and Television Administration jointly issued the “Regulations on the Management of Online Audio and Video Information Services” [10], emphasizing that “providers and users of online audio and video information services must prominently mark any non-real audio and video information created, published, or disseminated using new technologies and applications based on deep learning and virtual reality, and must not use such technologies and applications to create, publish, or disseminate false news information” [10].

Generative Adversarial Networks, as a new technology, are rapidly developing in the field of artificial intelligence and are widely used in entertainment, film, pharmaceuticals, and many other fields. I believe that many technologies have two sides. However, as laws continue to improve and society exercises self-restraint, Generative Adversarial Networks will further integrate into various aspects of our lives, better serving society.

Li Jing, Doctoral Candidate; Liu Wen, Doctoral Candidate; Gao Shenghua, Associate Professor: School of Information Science and Technology, ShanghaiTech University, Shanghai 201210. [email protected]

Li Jing, Doctoral Candidate; Liu Wen, Doctoral Candidate; Gao Shenghua, Associate Professor: School of Information Science and Technology, ShanghaiTech University, Shanghai 201210.

Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Advances in Neural Information Processing Systems, 2014, 27.
Mirza M, Osindero S. Conditional generative adversarial nets. Computer Science, 2014: 2672-2680.
Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. Proceedings of the International Conference on Machine Learning. Proceedings of Machine Learning Research, 2017: 214-223.
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. Computer Science, 2015.
Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers, 2019: 4401-4410.
Isola P, Zhu J Y, Zhou T, et al. Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE conference on computer vision and pattern recognition. Institute of Electrical and Electronics Engineers, 2017: 1125-1134.
Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE international conference on computer vision. Institute of Electrical and Electronics Engineers, 2017: 2223-2232.
Saito K, Saenko K, Liu M Y. Coco-funit: Few-shot unsupervised image translation with a content conditioned style encoder. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part Ⅲ 16. Springer International Publishing, 2020: 382-398.
Li Y, Zhang R, Lu J, et al. Few-shot image generation with elastic weight consolidation. arXiv: 2012. 02780, 2020.
Three departments issued the “Regulations on the Management of Online Audio and Video Information Services”. China Government Network, 2019-11-29[2021-8-17]. http://www.gov.cn/xinwen/2019-11/29/content_5457064.htm.

Keywords:Artificial Intelligence Generative Adversarial Networks Image Generation Sequential Data Generation New Drug Design ■

END

This article is published in the 2022 Volume 74, Issue 2 of the journal “Science” (P49)

“Science” magazine was first published in Shanghai in January 1915,

Edited and written by scholars such as Ren Hongjun, Yang Xingfo, Hu Mingfu, and Zhao Yuanren in its early years,

It is the longest-running comprehensive scientific publication in China.

The magazine is positioned as a high-level popular science journal, dedicated to the dissemination of scientific knowledge, concepts, and spirit, the interaction between science and humanities, and the integration of history and cutting-edge, serving to enhance the scientific literacy of the Chinese people and build an innovative country.The current editor-in-chief is Mr. Bai Chunli, an academician of the Chinese Academy of Sciences, and the sponsoring unit is Shanghai Scientific and Technical Publishers.

To purchase the “Science” magazine, please call the mail order department of Shanghai Scientific and Technical Publishers:021-53203260Contact: Mr. Wang Working hours: Monday to Friday, 8:30-16:00

Submission email: [email protected]

Some images are sourced from Pixabay, and the copyright of the article belongs to the Science magazine

◎This article is original to the “Science” magazine, please indicate the details when reprinting

This color block is a reprint statement link, please read first

The 2022 Volume 74, Issue 2 of “Science” is about to be published

Leave a Comment Cancel reply