Introduction
On August 30, 2019, an AI face-swapping app developed by Beijing Momo Technology Co., Ltd. was launched in major mobile app markets. According to reports, users only need to upload a frontal photo of themselves, and the software can use AI technology to swap the face of a celebrity in a video, generating a realistic video clip of the user. The app quickly became popular online upon its release, but it also sparked serious privacy controversies. On September 3, 2019, in response to issues such as the irregular user privacy agreement and the risk of data leakage in the software, the Cybersecurity Bureau of the Ministry of Industry and Information Technology held a discussion with relevant personnel from Beijing Momo Technology Co., Ltd., requiring them to conduct self-examination and rectification in strict accordance with national laws and regulations, and to strengthen the protection of users’ personal safety information.
In recent years, artificial intelligence technology has gained significant attention, with various applications emerging that bring convenience to people’s work and lives while also posing risks to privacy and data security. The recent “face-swapping” incident is not the first major controversy caused by artificial intelligence technology. Previous incidents involving “DeepNude” and “DeepFakes” have sparked widespread discussions among experts both domestically and internationally regarding the infringement of personal privacy and portrait rights by facial forgery technology. The series of social issues brought about by related technologies has also attracted the attention of government departments. On June 13, 2019, the U.S. House Intelligence Committee held a hearing on artificial intelligence deep forgery, discussing the risks and preventive measures of AI-based deep forgery technology to the nation, society, and individuals. This article comprehensively sorts out the existing face image forgery and anti-forgery technologies from a technical perspective, aiming to provide references for the formulation of relevant policies and standards for facial security.
Overview of Face Image Forgery Technology
Face image (video) forgery has become an increasingly popular topic in the field of computer vision since 2020, with related technologies gradually improving with the advancement of new algorithms. The earliest face-swapping technology was achieved based on feature matching algorithms, which extracted feature information such as eyebrows and eyes from one face and matched it to another face. This method does not require training or datasets, but the results are poor, and the synthesized images are very unnatural and cannot modify expressions.
1. Face2Face
The significant improvement in the quality of face forgery (face-swapping) images is attributed to the introduction of Generative Adversarial Networks (GAN) algorithms. Given a source image and a target image, GAN can learn the transformation relationship between the two images. Justus Thies from the University of Nuremberg published a paper on the Face2Face algorithm based on GAN in 2016. This algorithm can replicate a person’s facial features, expressions, and even the changes in facial muscles while speaking in one video to a character in another video in real-time. This is not only the first algorithm capable of real-time facial transformation but also achieves a level of realism that is almost indistinguishable from reality. Justus Thies’ paper provides a detailed explanation of the “face-swapping” technology: first, the algorithm reconstructs the facial features of the target and source faces and tracks their expressions in real-time, then uses a designed deformation function to re-render the target face using the shape and lighting of the source face, ultimately compositing the target face with the background. Unlike existing models, the Face2Face algorithm maintains the mouth shape of the target face, resulting in a final performance that is highly real-time and realistic.
2. DeepFakes
In December 2017, a user named “DeepFakes” posted a “fake video” on Reddit, where the celebrity’s face was added in post-production but appeared almost flawless. The core of the algorithm named DeepFakes is the “autoencoder,” an early model of deep neural networks. The autoencoder compresses the incoming data and regenerates the original data from this encoding. The principle of the algorithm is: assuming that the autoencoder can transform any face A into face B, where face B represents any face, the network can be trained with a large amount of data to learn how to decode and restore face A from face B, ultimately allowing the model to transform any face into face A. Korshunov and Pavel demonstrated in another article that the forged faces generated by the DeepFakes algorithm can deceive most deep learning-based facial recognition and monitoring systems. Shaoanlu improved the DeepFakes algorithm by adding “adversarial loss” and “perceptual loss” functions to the original autoencoder architecture, proposing the “faceswap-GAN” model, which produces more natural and realistic forged facial images. Subsequently, DeepFakes launched an application that allowed users to create face-swapping videos in just a few simple steps, but this application was quickly banned by major websites due to widespread criticism.
3. HeadOn
In June 2018, the “Face2Face” team released their newly developed “HeadOn” technology in their published paper. The HeadOn technology can be understood as an upgraded version of the Face2Face technology, which combines precise tracking of deformation proxies and view-based texture for video re-rendering. Researchers claim: “This system is the first real-time source-to-target re-enactment method for human portrait videos, achieving the transfer of torso movement, head movement, facial expressions, and gaze.” Previous related technologies could real-time forge faces in videos, but the head and body postures remained unchanged, and the gaze direction did not change with the body’s movement, resulting in unnatural generated images or videos. The HeadOn technology addresses these two shortcomings. As shown in Figure 1, the videos generated by the HeadOn technology incorporate real-time migration of body, head movements, and gaze, making the characters in the generated videos more realistic.
Figure 1 Example of HeadOn Technology
4. paGAN
In August 2018, a paper presented by a team led by Chinese professor Li Hao at the SIGGRAPH 2018 conference received significant attention in the industry. They developed a deep learning technology based on GAN, named “paGAN.” As shown in Figure 2, the paGAN technology can generate portrait videos using only a single photo and track faces at a speed of 1000 frames per second. Compared to previous video-to-video face-swapping technologies, the introduction of paGAN technology further lowers the technical threshold for face video forgery. This method first creates a reasonable 3D mesh, then performs shape matching and angle transformation on the input image and 3D shape, and subsequently uses their self-developed facial tracker VGPT to track the position and detail state of the face. Its most notable feature is its astonishing speed; on a personal computer equipped with 1080P, the tracker can achieve a maximum frame rate of 1000; even on mobile devices, it can reach 60-90 frames per second (FPS), meeting the frame rate requirements for video shooting by mobile cameras, allowing for real-time tracking and forgery of facial video.
Figure 2 Example of Face Video Generated by paGAN Technology
As shown in Figure 2, the example of a face video generated by paGAN technology indicates that as research progresses, the quality of face image and video forgery becomes increasingly realistic, and the barriers to entry continue to decrease. The hidden risks regarding user data security and privacy are also becoming more pronounced. Considering personal and national security, relevant departments have highly focused on the potential risks of malicious use of artificial intelligence technology, calling for collaboration among academia, industry, and relevant government support departments to jointly conduct forward-looking research on technology and to grasp response strategies early.
Overview of Face Image Anti-Forgery Technology
On March 15, 2017, the CCTV 315 Gala exposed flaws in the Alipay facial recognition system, demonstrating on-site that a photo after face-swapping successfully completed identity authentication, allowing access to another person’s account. Although the Alipay team later clarified that facial recognition is only one part of Alipay’s multi-layer protection system and that it is not possible to log in solely with a face, with algorithms like paGAN that can generate real-time facial videos using only one image becoming open-sourced, traditional face anti-forgery methods based on facial videos, such as “blink” or “read a random number,” have become even more vulnerable.
1. Based on Blink Detection
In May 2018, the U.S. Department of Defense’s Advanced Research Projects Agency (DARPA) funded a research project by a team from the State University of New York called “Media Content Authentication.” This research team found that forged faces generated through GAN technology rarely blink or do not blink at all, as the training set for generating forged face models rarely includes images of closed eyes. Subsequently, in August 2018, they developed the world’s first “anti-face-swapping” artificial intelligence criminal detection tool to combat AI with AI. This tool detects the state of the eyes in videos, achieving an accuracy rate of 99% on a specific forged face dataset. However, this is only a temporary achievement; the typical characteristic of existing data-driven artificial intelligence algorithms is that they can be fine-tuned using the latest specific data to compensate for algorithm deficiencies. This means that the birth of this forged face detection tool merely marks the beginning of the artificial intelligence race between face forgery and anti-forgery technologies.
2. Based on CNN and LSTM
Based on the excellent performance of Convolutional Neural Networks (CNN) in image recognition and classification tasks, researchers have proposed a series of face anti-forgery algorithms based on CNN that have achieved good results. Researchers from the Technical University of Munich and other universities generated a large dataset of face forgery videos named FaceForensics++ by using common face forgery algorithms such as Face2Face, DeepFakes, and FaceSwap mentioned earlier, which contains 510,207 images and labels from 1000 real videos to ensure the effectiveness of supervised learning. Authors Li Y and others described experiments using the XceptionNet algorithm to detect forged faces on this dataset, ultimately achieving a detection accuracy of 99.08%, far exceeding the accuracy of human observers. Li Y and others also proposed a face image anti-forgery method based on detecting facial distortion artifacts, believing that current face forgery algorithms can only generate images of limited resolution, requiring further distortion of these images to match the faces in the source video, thereby leaving distorted artifacts in the forged videos that can be effectively captured by convolutional neural networks. By training classifiers, forged videos can be effectively distinguished. Hsu C C proposed a deep forgery detector (Deepfd), which uses contrastive loss to find typical features of forged images generated by different GAN algorithms, and finally cascades a classifier for classification. Experiments show that Deepfd achieved a forgery detection accuracy of 94.7% on datasets of fake face images generated by several state-of-the-art GAN algorithms.
Figure 3 Detecting Fake Videos by Checking if Characters Blink
Guera D proposed a face anti-forgery method based on CNN and Long Short Term Memory Network (LSTM) algorithms. Since most current face forgery technologies are based on autoencoders or GAN algorithms, the generated videos exhibit frame independence between the current frame and the previous frame, leading to frame continuity issues where some important information changes in the previous frame cannot be applied to the current frame. For example, when there is a change in lighting in the video, the generated forged video may have pixel anomalies that are difficult for the human eye to observe. Li Y and others used CNN to extract image features and then used LSTM for temporal analysis, utilizing changes in the video time dimension for identification. This method achieved a maximum detection accuracy of 97.1% across 600 videos. Similarly, using the CNN+LSTM method, Li Y and others judged the authenticity of videos by checking whether characters blink, naming this method Long-term Recurrent Convolutional Networks (LRCN). The LRCN method uses CNN to extract features based on six label points around the eyes after facial alignment and employs LSTM for sequence learning and prediction of eye states, ultimately making a determination. As shown in Figure 3, in the first row of the real face video sequence, the character exhibits blinking, while in the second row of the fake video, the character does not blink.
Figure 3 Detecting Fake Videos by Checking if Characters Blink
Conclusion
The current wave of artificial intelligence has greatly promoted the development of AI vertical industries, bringing disruptive innovation and development to many sectors. However, as various industries embrace the wave of artificial intelligence, they must guard against the hidden risks of privacy and ethical security. It is hoped that relevant researchers and practitioners will persist in conducting positive research to eliminate the harm and threats posed by artificial intelligence technology to humanity; and it also calls for relevant research policy support units to establish and improve the standard system for AI-related products, algorithms, and frameworks, promoting the standardized development of the industry and effectively reducing the social risks brought by face image forgery technology.
Author Profile
Zhang Yi, PhD, engineer at the Terminal Laboratory of the China Academy of Information and Communications Technology, mainly engaged in research on artificial intelligence.
Contact:zhangyi5@caict.ac.cn