DeepFake Detection Using Heartbeat Signals

Selected from arXiv

Authors: Umur Aybars Ciftci et al.

Translated by: Machine Heart

Editor: Chen Ping

Can DeepFake truly achieve indistinguishable effects?Not necessarily.Researchers from Binghamton University and Intel utilize heartbeat signals to discern the authenticity of videos and can also “uncover” the underlying generative models.

The technology for generating fake portrait videos poses new threats to society, such as using realistic fake images and videos for political propaganda, celebrity impersonation, falsifying evidence, and other identity-related operations. With the development of these generative technologies, some effective DeepFake detection methods have emerged, which possess high classification accuracy. However, currently, there is almost no work focusing on the source of DeepFake videos (i.e., the models generating DeepFake videos).

Researchers from Binghamton University and Intel proposed a method,using biological signals in videos to detect whether the video is fake. This method not only distinguishes between real and fake videos but also identifies the specific generative models behind DeepFake videos (where the generative models are chosen from DeepFakes, Face2Face, FaceSwap, NeuralTex).

Some purely deep learning-based methods attempt to classify fake videos using CNN, which actually learn the residuals of the generator. This study argues that these residuals contain more information and can reveal forgery details by separating them from biological signals. Observations indicate that the spatiotemporal patterns in biological signals can be viewed as representative projections of the residuals. To validate this observation, the researchers extracted PPG units from real and fake videos and input them into the current optimal classification network to detect the generative model of each video.

Experimental results show that the method achieves a detection accuracy of 97.29% for fake videos and an identification accuracy of 93.39% for the generative models behind the fake videos.

Paper link:

https://arxiv.org/pdf/2008.11363.pdf

The contributions of this paper are as follows:

Proposed a novel method for detecting the source of DeepFake videos, opening a new perspective for DeepFake detection research;
Introduced a new finding: projecting generative noise into the biological signal space can create a unique identifier for each model;
Proposed an advanced universal DeepFake detector that outperforms existing methods in classifying real and fake videos while also predicting the generative model behind the fake videos, i.e., the source generative model.

Detecting Fake Videos and Their Generative Models Using Biological Signals

Biological signals have been proven to serve as authenticity markers for real videos, and they are also used as important biomarkers for DeepFake detection. As we know,the synthetic characters in fake videos cannot exhibit heartbeat patterns similar to those in real videos. The key finding of this study is based on the fact that these biological signals can be interpreted as containing fake heartbeats that represent the residual identification transformation of each model. This gives rise to new explorations of biological signals, which can not only determine the authenticity of a video but also classify the source model that generated the video.

Thus, the study proposed a system that can detect DeepFake videos and identify the source generative model, as shown in Figure 1:

To continuously capture the features of biological signals, the researchers defined a new spatiotemporal block—PPG unit. This spatiotemporal block combines multiple raw PPG signals and their power spectra, extracted from a fixed window. The generation of the PPG unit first requires using a face detector to find faces in each frame.

The second step is to extract the region of interest (ROI) from the detected face (Figure 1d), which has stable PPG signals. To effectively extract, the researchers used the facial area between the eyes and mouth to maximize skin exposure.

Since there is correlation among PPG signals from different regions of the face, locating the ROI and measuring its correlation becomes a crucial step in the detection process.

The third step requires aligning the nonlinear ROI with a rectangular image. The study used Delaunay triangulation [26], followed by applying nonlinear affine transformations to each rectangle, converting each rectangle into a rectified image.

In the fourth step, the researchers divided each image into 32 equally sized squares and calculated the raw Chrom-PPG signal for each square within a fixed window of size ω frames, ensuring that it does not interfere with face detection (Figure 1e). The Chrom-PPG in the rectified image is then calculated, as it produces more reliable PPG signals. For each window, there are now ω × 32 raw PPG values.

These are then reassembled into a matrix of 32 rows and ω columns, forming the basis of the PPG unit, as shown in the upper half of the last row of Figure 1f and Figure 2.

The final step adds frequency domain information to the PPG unit. The power spectral density of each raw PPG value within the window is calculated and scaled to size ω.

The bottom row of Figure 2 shows examples of DeepFake PPG units generated from the same window, with the first row being example frames from each window.

After defining the PPG unit, the researchers presented their main hypothesis: projecting the residuals of the DeepFake generator into the biological signal space can create a unique pattern for detecting the source generative model behind DeepFake.

Experiments

The system proposed in this study is implemented in Python, using the OpenFace library for face detection, OpenCV for image processing, and Keras for neural networks.

Table 1 lists the classification results of the PPG units on the test set, where VGG19 achieved the highest accuracy in distinguishing four different generative models and detecting real videos from FaceForensics++ (FF) (Figure 1f). Complex networks like DenseNet and MobileNet, despite achieving very high training accuracy, performed poorly on the test set due to overfitting.

In video classification, Table 2 records the different voting schemes during the process. The researchers set ω=128 and compared the predictions of VGG19’s units using majority voting, highest average probability, the two highest average probabilities, and log odds average.

As shown in Figure 3, the method achieved a detection rate of 97.3% for real videos among five categories of FF (1 real video and 4 fake videos), with an accuracy of at least 81.9% for detecting the generative models.

The researchers trained and tested on different settings: 1) no real videos in the training set; 2) no power spectrum in the PPG units; 3) no biological signals; 4) using full frames instead of face ROIs, where ω = 64, and the FF dataset split was set as a constant. The results are shown in Table 3:

Using the aforementioned settings, the method proposed in the paper was tested with different window sizes ω = {64, 128, 256, 512} frames. The results are shown in Table 4:

To demonstrate that the method proposed in the paper can be generalized to new models, the researchers combined the FF settings with a single generator dataset CelebDF and repeated the analysis process. The proposed method achieved a detection accuracy of 93.69% across the entire dataset and 92.17% on the CelebDF dataset, indicating that the method can generalize to new models (see Table 5).

Table 6 lists the accuracies of different models on the test set. The results show that the method proposed in the paper even outperformed the most complex network Xception, achieving an accuracy 10% higher.

How to Match the Right Type of Database According to Task Requirements?

In the white paper “Entering the Era of Dedicated Databases” released by AWS, eight types of databases are introduced: relational, key-value, document, in-memory, relational graph, time series, ledger, and wide column. Each type’s advantages, challenges, and main use cases are analyzed one by one.

Clickto read the original textorscan the QR codeto apply for free access to the white paper.

DeepFake Detection Using Heartbeat Signals

For reprints, please contact this public account for authorization

Submissions or inquiries: [email protected]

Leave a Comment Cancel reply