In early December 2024, a promotional video surfaced online, vigorously promoting a food product, and surprisingly, it featured Dr. Zhang Wenhong, the director of the National Center for Infectious Disease Medicine and head of the Infectious Diseases Department at Huashan Hospital affiliated with Fudan University.
Many netizens, trusting Dr. Zhang, not only made purchases themselves but also actively shared the video in family groups. However, Dr. Zhang quickly clarified: the video was not recorded by him and was likely generated using AI.
Screenshot from the internet
In recent years, with the rapid development of artificial intelligence technology, such incidents of AI-generated fake videos have frequently come to light. Criminals exploit AI technology to impersonate celebrities for fraud, publish false advertisements, and even create forged videos to deceive individuals out of substantial amounts of money, posing a serious threat to the social trust system and information security.
Today, we will analyze such events from a technical perspective.
Technology Behind AI Forged Videos:
From Generative Adversarial Networks to Deepfake
The technology used for these AI-forged videos is not new; research on facial replacement and image generation began in academia in the 1990s. In 2014, Ian Goodfellow proposed Generative Adversarial Networks (GAN), enabling computers to generate more realistic and high-quality images.
GAN consists of two parts: a Generator that creates content and a Discriminator that detects whether the content created is real and provides feedback to the Generator. During training, they compete against each other, enhancing their respective capabilities, ultimately allowing the Generator to produce highly realistic images.
In 2017, an account named “deepfakes” was created on the online community Reddit, where some celebrity face-swap videos made with GAN technology were posted. Since then, the term “Deepfake” has been widely used, typically referring to the technology behind such face-swapping videos.
In January 2018, a desktop application called FakeApp was released. Subsequently, open-source tools like Faceswap and DeepFaceLab emerged with similar functionalities. These tools significantly lowered the barrier to creating Deepfake videos, allowing ordinary users without specialized knowledge to easily generate face-swapped videos.
With continuous advancements in related technologies, today’s Deepfakes not only produce videos with higher resolution and more natural facial expression synchronization but also require less data and shorter training times. Early Deepfake training often required hundreds to thousands of images of the target person or several minutes to hours of video to capture facial features from different angles and expressions. However, with the development of GAN technology, techniques like Transfer Learning and Few-Shot Learning have emerged, allowing Deepfake videos to be generated with as few as a few dozen or even a single photo.
Image copyright; using it may lead to copyright disputes
Additionally, early Deepfakes could only generate visuals, but now, combined with Voice Cloning and Voice Style Transfer technologies, they can also generate convincingly realistic voices of the target person, synchronizing the lip movements in the video with the audio.
In simple terms, now, with just one photo and a few seconds of voice, a Deepfake video can be generated. Of course, more photos and longer audio will result in even more realistic videos.
Positive Applications and Negative Impacts of Deepfake Technology
Although Deepfake is a form of “forgery,” it can have many positive applications when done with the consent of the person being “forged.” For example, in 2019, British football star David Beckham released a video calling for the eradication of malaria, in which he used nine languages, including Swahili and Yoruba. Except for English, the other eight languages were generated using Deepfake technology.
Moreover, the various technologies used in Deepfake have broad applications in fields such as digital humans, video streaming, film production, education and training, and psychological rehabilitation.
However, any technology can be misused. Before the previously mentioned fake Zhang Wenhong, there were already instances of fake Jin Dong, fake Jack Ma, fake Lei Jun, and fake Yu Donglai. Impersonating celebrities to sell products is not the worst; Deepfake technology has also been used in many more malicious ways, such as fraud.
On December 20, 2024, the BBC reported a story titled “Love Scammers Use Deepfake to Steal £17,000 from Me.” The victim was 77-year-old Nikki MacLeod. The scammer claimed to work on an oil drilling platform and asked Nikki to purchase Steam gift cards and transfer money via Paypal to obtain internet access on the drilling platform and cover travel expenses to meet her in Scotland. Nikki was initially skeptical but believed the scammer after seeing the video sent from the drilling platform.
In January 2024, an employee of a Hong Kong company transferred $25 million from the company’s account to a scammer. The scammer conducted a video call pretending to be the chief financial officer and requested the transfer. During the video call, the employee not only saw the “CFO” but also other “colleagues.”
According to a report published in May 2024 by the renowned accounting firm Deloitte, Deepfake fraud in the U.S. increased by 700% in 2023, causing losses of $12.3 billion, and this figure could reach $40 billion by 2027.
Image copyright; using it may lead to copyright disputes
Besides fraud, Deepfake technology can also be used to deceive facial recognition systems. Many mobile applications use facial recognition to verify user identity. To prevent the use of photos to impersonate faces, current facial recognition technologies typically require liveness detection. Techniques like 3D structured light require special hardware, but only some phones support them.
Currently, many systems still rely on 2D facial recognition based on the front camera of smartphones. Liveness detection in 2D facial recognition generally requires users to perform actions like blinking or nodding, along with the screen flashing specific colored lights. Therefore, some individuals have used Deepfake technology combined with other methods to deceive facial recognition and steal online accounts.
How to Identify and Detect Deepfake Videos?
Deepfake videos that are not well-made can often be detected with the naked eye. For example, unnatural facial expressions or eye movements, infrequent blinking, blurry facial edges, or unnatural transitions with the background, as well as inconsistent lighting effects on the face compared to the surrounding environment are indicators. However, as Deepfake technology advances, these anomalies are becoming less common.
Currently, known Deepfake technologies can mimic facial expressions but cannot replicate the deformations that occur under external pressure. Therefore, during a video call, if there are suspicions of Deepfake fraud, one can ask the other party to press one side of their nose or cheek with their index finger.
In addition to visual detection, using AI to identify AI-generated content is also a popular research direction. For instance, synthetic videos may exhibit discontinuities between frames, and temporal consistency analysis could reveal anomalies. Additionally, the human heartbeat causes subtle color changes in the skin that align with the pulse rhythm, and Deepfake videos may lack this characteristic.
However, we must also recognize that the generative adversarial network technology used in Deepfake consists of both a Generator and a Discriminator. Any method for detecting Deepfake could also be incorporated into the Discriminator of Deepfake, thus generating content that is difficult to detect.
Image copyright; using it may lead to copyright disputes
In recent years, China has implemented regulations like the “Internet Information Service Deep Synthesis Management Regulations” and the “Interim Measures for the Management of Generative Artificial Intelligence Services,” but these laws primarily restrict the platforms providing related services. With the advancement of software and hardware technology, Deepfake can now be accomplished using models running on personal computers.
Therefore, to address the various issues posed by Deepfake, a comprehensive governance mechanism is needed, requiring multi-faceted collaboration among technology, platforms, and laws in the future. Impersonating celebrities using Deepfake to deceive others into purchasing products may not only violate Article 1019 and Article 1024 of the Civil Code, infringing on others’ rights of portrait and reputation but may also involve crimes such as fraud under Article 266 and false advertising under Article 222 of the Criminal Law. Such unlawful or criminal acts involving Deepfake must be addressed in accordance with the law.
Editor: Yang Xiaodong Editor-in-Chief: Li Xiang
Reviewed by: Hu Songgui Final Review: Shi Lin
Source: Science Popularization China
Editing and Operation: Nujiang Jiuxu Media Communication Co., Ltd.
Disclaimer: Some materials and images come from publicly available online channels, and their sources cannot be identified. They are intended only as promotional materials for science popularization and are not for commercial use. If there is any infringement, please contact us for deletion.
Public Science Popularization Scientific Communication
Official WeChat account of Nujiang Science Association
Nujiang Science Popularization Platform