Deepfake Technology: Near Reality But Not Quite Real

Have you ever mistaken an AI-generated user avatar for a real person on social media? As “deepfake” technology approaches realism and even bypasses the uncanny valley effect, the line between “true” and “false” has been redefined.
As one of the most significant manifestations of “deepfake” technology, AI synthesis technology has been widely applied in the entertainment industry, appearing in movie dubbing, game characters, and live television broadcasts. Meanwhile, the public is increasingly using this technology to create content.
However, the application of technology has also raised privacy and security issues. On January 28, the Cyberspace Administration issued a draft for public consultation regarding deep synthesis technology and content, stating that deep synthesis service providers should prominently label the generated deep synthesis information. If they offer significant editing functions for biometric information such as facial features and voice, they must also obtain separate consent from the individuals whose information is edited.[1]
What changes and constants have emerged in the development of “deepfake” technology? How is it being applied? This issue of Quanmei Pai (ID: quanmeipai) takes you into the indistinguishable virtual world to explore how the development of this technology affects content creation and experience.
Deepfake Technology: Near Reality But Not Quite Real
Face Swapping and Synthesized Voices Closer to Reality
First, let’s revisit “deepfake,” which refers to AI synthesis technology at its core, based on deep learning and virtual reality, using generative synthesis algorithms to create text, images, audio, video, or virtual scenes. This mainly includes facial replacement in image form, voice replacement in audio form, and synchronized replacement of faces and voices in footage.
Since its development in 2018, the iteration speed of AI synthesis technology has exceeded expectations, approaching reality infinitely, particularly represented by AI face-swapping technology and AI voice synthesis technology.
AI Face Swapping
Numerous pieces of evidence indicate that AI facial synthesis technology has developed to the point of being indistinguishable from real faces. For instance, the well-known foreign website This Person Does Not Exist can generate a portrait in just one second, with astonishing realism. Furthermore, Generated Photos can automatically generate 100,000 facial photos through deep learning on a vast database of real portraits, allowing users to use them as profile pictures on social networking sites and for other purposes.[2]
Research has also shown that AI-generated faces can appear more natural than real faces. In a recent study at the University of California, Berkeley, 315 randomly selected participants were asked to distinguish between 400 AI-generated faces and 400 real faces from a pool of 800 images. The two groups of faces included 100 each of White, Black, East Asian, and South Asian individuals.
The results showed that the participants’ overall accuracy was 48.2%, even lower than random selection accuracy. Subsequently, the researchers gathered another 219 participants and provided training on recognizing AI-generated faces. The final accuracy of this group increased to 59%, but the researchers noted that “the difference is not significant”.[3]
Deepfake Technology: Near Reality But Not Quite Real

A group of AI-generated faces, can you tell which are real? Image source: New Scientist

AI Voice Synthesis
Riding the fast development of machine learning technology, AI voice synthesis technology has also made rapid progress in recent years. In the past, creating realistic voices usually required recording a segment of real human voice, cutting it into individual voice units, and then reassembling it like a puzzle to create new audio segments.
Now, artificial intelligence can automatically generate entirely different audio content by learning from the target person’s voice data, making it faster, more convenient, and more natural.
Related applications can even convey emotions and tones.For example, in a voice software designed by Sonantic, users can type the text they want to generate as audio and choose the emotions they want to convey, such as anger, fear, sadness, joy, or excitement.
Additionally, users can further adjust the pitch and strength of the voice through a “director mode,” even inserting non-verbal sounds like laughter or breathing, akin to a “voice version of Photoshop.” The company has already collaborated with AAA game companies and entertainment firms and successfully became the in-car voice assistant for Mercedes-Benz earlier this year.[4]
Deepfake Technology: Near Reality But Not Quite Real

Sonantic’s user interface. Image source: The Verge

In the coming years, the most widespread application scenario for AI voice synthesis technology may be celebrity voice cloning, which allows celebrities to authorize their voices to be synthesized for commercial use. An American company named Veirtone launched a service at the beginning of the year where celebrity influencers only need to give simple authorization to have their voices appear in advertisements or television broadcasts.[5]
Deepfake Technology: Near Reality But Not Quite Real
The Gradual Impact of “Deepfake” on Daily Content Experience
The continuously lowering technical barriers, the strong entertainment value of synthetic audio and video, and the dissemination characteristics of social networks have led to an explosion of user-generated “deepfake” content on the internet.
The “Top Ten Trends in Deep Synthesis (2022) Report” shows that the creation and dissemination of deep synthesis content on the internet have grown exponentially, with deep synthesis videos on the top ten audio and video platforms (Tencent Video, iQIYI, Youku, Bilibili, Douyin, Kuaishou, Weibo, YouTube, Twitter, TikTok) receiving as many as 316 million likes.[6]
Deepfake Technology: Near Reality But Not Quite Real

Image source: “Top Ten Trends in Deep Synthesis (2022) Report”

The “deepfake” technology is no longer exclusive to tech companies, nor is it confined to film and advertising. An increasing number of related applications and platforms are allowing the general public to experience AI generation in broader contexts.
AI Podcast Editing
Descript is a podcast editing software developed by an American company using AI voice synthesis technology. Users can edit or even create their own audio content using AI voice cloning technology. After cloning and generating their AI voice using the software’s “overdub” feature, users can adjust the audio simply by editing the transcribed text, making it user-friendly and easy to operate.[7]
Creating Video Memes
AI face-swapping technology is also becoming popular among the general public with tools like FakeApp and DeepFaceLab. Last year, a Tom Cruise face-swapping video that went viral on short video platforms is a strong testament. In the video, a “person” resembling Tom Cruise in appearance and voice performs a magic trick or casually strolls around a men’s clothing store, doing things that starkly contrast with the identity of a Hollywood superstar.
These videos were created by visual effects artist Chris Umé, who collaborated with a top impersonator of Tom Cruise and used “deepfake” technology to produce these viral videos. It has been proven that the highly popular social media and advanced AI synthesis technology give “deepfake” content the potential to become a “meme” at any time and anywhere.
Deepfake Technology: Near Reality But Not Quite Real

Screenshot from the “deepfake” video of Tom Cruise on TikTok. Image source: tiktok@deeptomcruise

“Recreating Loved Ones”
The video technology platform Deep Nostalgia created by the Israeli company D-ID has caused quite a stir. By using the AI face generation service provided by the platform, users can re-upload pictures of deceased relatives to generate moving images. The “relatives” in the video can blink, smile, nod, and perform a series of actions, providing users with a visual effect akin to face-to-face interaction, making it no longer a fantasy to bring the deceased “back to life” in cyberspace through AI synthesis technology.
Deepfake Technology: Near Reality But Not Quite Real
The Boundaries and Future of Deepfake + Content Creation
As “deepfake” technology gradually becomes more accessible, internet users will have more diverse choices and innovations in content creation. It is not hard to imagine that in the near future, there may be a game where players can generate and use their AI voices on the character creation page, experiencing a more immersive world within the game.
However, as the line between “true” and “false” is repeatedly blurred by technological iterations, how should the boundaries of content creation and dissemination be redefined? Legal, moral, and ethical issues are always unavoidable topics.
As early as March 2019, The Wall Street Journal reported a fraud case: criminals used AI voice synthesis technology to mimic the voice of a German boss, deceiving the head of a British subsidiary into transferring approximately 1.73 million yuan.[8] Aside from information fraud and other criminal activities, the biggest threat posed by this technology remains the unauthorized creation or dissemination of pornographic videos.
For social networks, the most significant risk of “deepfake” technology currently lies in the dissemination of misinformation, impacting public opinion. When “seeing” does not necessarily mean “believing,” the credibility of real information will significantly decrease. Coupled with the internet’s dissemination capacity, it will become increasingly difficult to distinguish between true and false information.
Since audio and video can be indistinguishable from reality, anyone can slander real content as a product of synthesis technology to try to reverse the truth, evade responsibility, or even trigger widespread public skepticism about real information, known as the “fraudster’s dividend.” [9] This could be one of the most severe tests that “deepfake” technology poses to the real world.
Deepfake Technology: Near Reality But Not Quite Real
Image source: Visual China
Deborah Johnson, an honor professor of applied ethics at the University of Virginia, mentioned in her recently published book “How Should We Respond to Deepfake Technology” that “deepfake” technology can severely impact the credibility of official media and the trustworthiness of online information.
“We will no longer believe everything we see and hear online.” However, at the same time, labeling may be the simplest and quickest way to address such threats. “If viewers know in advance that a video is AI-generated, they may be less likely to be deceived.”[10]
However, how to properly label synthetic content without undermining the creator’s intent? How to define a series of ethical issues related to gender, age, race, etc.? This undoubtedly presents a long list of challenges for content managers.
Professor Lilian Edwards from Newcastle University Law School raised the question of ownership and usage rights after AI-generated content is commercialized. For example, when using a video generated with the face of deceased actor Steve McQueen for commercial purposes, whether his family can own the video and profit from it remains debatable. “Currently, the relevant regulations vary slightly from country to country.”[11]
Even though various complex issues are involved, it does not diminish the trend that is becoming a reality: in the near future, “deepfake” technology will enter thousands of households, allowing everyone to clone their faces and voices through AI synthesis for use in short videos, live broadcasts, interactive media, and other broader fields. However, infinitely close to reality, yet still not real, the hidden dangers and new challenges that may lurk behind this grand blueprint remain to be written and filled.
[Today’s Interaction]

What AI-generated content have you consumed? Or which AI synthesis applications have you used? What are your thoughts on the popularization of such technology? Feel free to share with us in the comments!

References:

[1]https://m.mp.oeeee.com/a/BAAFRD000020220129648488.html

[2]AI-Generated Faces: What Makes a Human a Human?

https://www.shutterstock.com/blog/ai-generated-faces-what-makes-a-human

[3]Fake faces created by AI look more trustworthy than real people

https://www.newscientist.com/article/2308312-fake-faces-created-by-ai-look-more-trustworthy-than-real-people/

[4] Listen to an AI voice actor try and flirt with you

https://www.theverge.com/2022/2/17/22936978/ai-voice-speech-synthesis-audio-deepfake-sonantic-flirtation

[5][7]

EVERYONE WILL BE ABLE TO CLONE THEIR VOICE IN THE FUTURE

https://www.theverge.com/22672123/ai-voice-clone-synthesis-deepfake-applications-vergecast

[6]《深度合成十大趋势(2022)报告》

https://www.iczhiku.com/hotspotDetail/uvDLklAJE6mJZK6uCmFzRA==

[8]https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402

[9]The impact of deepfakes: How do you know when a video is real?

https://www.cbsnews.com/news/deepfakes-real-fake-videos-60-minutes-2021-10-10/

[10][11]

https://edition.cnn.com/videos/business/2021/03/02/tom-cruise-tiktok-deepfake-orig.cnn-business

Deepfake Technology: Near Reality But Not Quite Real
Deepfake Technology: Near Reality But Not Quite Real
Deepfake Technology: Near Reality But Not Quite Real
Deepfake Technology: Near Reality But Not Quite Real
Deepfake Technology: Near Reality But Not Quite Real
Deepfake Technology: Near Reality But Not Quite Real

Leave a Comment