Create Your First DeepFake Video

Source: DeepHub IMBA




This article is about 1000 words long and is recommended for a 5-minute read.
After reading this article, you will also be able to create a DeepFake video.

Today, I want to talk about DeepFake, and I will explain how the First Order Motion algorithm works. After reading this article, you will also be able to create a DeepFake video.

AI-generated fake videos are becoming increasingly common (and more realistic), and many applications have emerged recently, which is why we should be concerned. However, we will only discuss this technology here.

DeepFake involves the following steps to create a face-swapping video:

First, thousands of facial photos of two individuals will be trained using an AI algorithm called an encoder.
The encoder discovers and learns the similarities between the two faces and simplifies them into shared common features while compressing the images in the process.
Then, the second AI algorithm called a decoder is taught to recover the faces from the compressed images.
Since the faces are different, you train one decoder to recover the face of the first person and another decoder to recover the face of the second person.
To perform the face swap, you simply input the encoded image into the “wrong” decoder.
For example, inputting the compressed image of person A into the decoder trained on person B.
The decoder then reconstructs person B’s face with the expressions and orientation of person A. This must be done for every frame to create a convincing video.

In this article, I will focus on the First Order Motion algorithm.

The First Order Motion algorithm relies on combining the appearance extracted from the source image with the motion patterns extracted from the video to be synthesized. This algorithm consists of two processes:

First process: Motion Extraction

Extract motion and facial expressions from the original video and target photo using a facial keypoint detector.
Match the facial keypoints between the video and photo.
For each frame in the video, transform each target photo.
Pass these frames to another model (Dense Motion) to extract the motion and lighting of the source photo.
In other words, the Dense Motion model generates optical flow and occlusion maps.

Second process: Generator

The Generator is another model that takes information about the photo, lighting, and motion to render the final result.

The First Order Motion is the simplest and fastest deepfake algorithm. Of course, there are more complex algorithms. The higher the resolution of the video output, the longer the training time.

I tried this algorithm and got this result. I only took five minutes to make this video. The result is not strong enough, but it is sufficient to understand how deepfake works.

Nowadays, it is easy to create fake videos/speeches from politicians and public figures and share these videos on social media. It will be harder to find the truth. We need an algorithm to detect what other algorithms are doing.

Here are the paper links and the code from this article:

https://arxiv.org/abs/2003.00196

https://github.com/AliaksandrSiarohin/first-order-model

If you want to use it online, here is a Colab link to view it directly:

https://colab.research.google.com/github/AliaksandrSiarohin/first-order-model/blob/master/demo.ipynb

Article Author: Amr Schemali

Editor: Wenjing

Create Your First DeepFake Video

Leave a Comment Cancel reply