Detection and Attribution Methods for Deepfake Reverse Engineering

Detection and Attribution Methods for Deepfake Reverse Engineering
Author | Facebook AI
Translator | Wang Qiang
Planner | Ling Min

In recent years, Deepfake images have become increasingly realistic. In some cases, it is difficult for humans to discern the difference between these images and real ones. Although detecting Deepfake images remains a significant challenge in the industry, the motivation to detect them has become stronger as the technology advances. For instance, what if Deepfake images are used for large-scale fraud rather than just for entertainment and technical demonstration?

Today, in collaboration with Michigan State University (MSU), we present a research method for detecting and attributing Deepfake images, based on reverse engineering the generative model of a single AI-generated image. Our approach will advance research on Deepfake image detection and tracking in real-world environments, where the Deepfake images themselves are often the only information available to detectors.

Detection and Attribution Methods for Deepfake Reverse Engineering

Why Reverse Engineer?

Current discussions on Deepfake images focus on determining whether an image is real or a Deepfake (detection), or identifying whether an image was generated by a model seen during training (image attribution based on a “closed set”). However, to address the rising trend of Deepfake images, further research is needed to extend image attribution beyond the limited set of models present during training. The emphasis is on moving beyond the limitations of closed image attribution methods, as Deepfake images may be created using generative models that were not seen during training.

Reverse engineering is another way to tackle the Deepfake image problem, but it is not a new concept in machine learning. Previous research on reverse engineering ML models has been done by examining their input/output pairs, treating the model itself as a black box. Another approach assumes that hardware information (such as CPU and memory usage) is available during model inference. Both methods rely on prior knowledge about the model itself, which limits their practicality in the real world since this information is often unavailable.

Our reverse engineering method is based on identifying unique patterns behind the AI model used to generate a single Deepfake image. We start with image attribution and then attempt to uncover the model attributes used to generate the image. Once we generalize image attribution to open set recognition, we can infer more information about the generative model used to create the Deepfake image, rather than just determining that a particular model was never seen before.

By tracking the similarities between a set of Deepfake image patterns, we can also determine whether a series of images comes from a single source. This ability to detect which Deepfake images are generated by the same AI model can be used to uncover instances of misinformation or other malicious attacks initiated using large numbers of Deepfake images.

How It Works

We first run a Deepfake image through a Fingerprint Estimation Network (FEN) to estimate the fingerprint details left by the generative model.

Device fingerprints are subtle yet unique patterns left on each image due to defects during the generation process. In digital photography, fingerprints are used to identify the digital camera that produced the image. Similarly, image fingerprints are unique patterns left by the generative model on the generated image, which can also be used to identify the model that generated the image.

Detection and Attribution Methods for Deepfake Reverse Engineering

Before the deep learning era, researchers generally used a small set of handcrafted and well-known tools to generate photos. The fingerprints of these generative models were estimated based on the features left when they were handcrafted. Deep learning has made the toolkit for generating images virtually limitless, making it impossible for researchers to identify specific “signals” or fingerprint attributes based on handcrafted features.

To cope with this infinite possibility, we use the properties of fingerprints as constraints for performing unsupervised training. In other words, we use different constraints based on common properties of fingerprints to estimate them, including fingerprint size, repeatability, frequency range, and symmetric frequency response. We then apply these constraints to the FEN using different loss functions, forcing the generated fingerprints to have these desired properties. Once the fingerprint generation is complete, the fingerprint can be used as input for model analysis.

Model analysis is a new problem that uses the estimated generative model fingerprint to predict the model’s hyperparameters, which are the attributes that constitute the model architecture, including the number of layers, the number of blocks, and the types of operations used in each block. One example of how a model’s hyperparameters affect the type of Deepfake images it generates is that its training loss function guides how the model is trained.

The model’s architecture and the type of training loss function both influence its weights, thus affecting how it generates images. To better understand hyperparameters, we can think of the generative model as a car, with its hyperparameters as various specific engine components. Different cars may look very similar, but under the hood, they may have very different engines and components. Our reverse engineering technique is somewhat like identifying a car’s components by its sound, even if it is a new car whose sound we have never heard before.

Through our model analysis method, we simultaneously estimate the network architecture and training loss function of the model used to create Deepfake images. We normalized some continuous parameters in the network architecture to facilitate training and also performed hierarchical learning to classify the types of loss functions. Since various generative models are mostly different in terms of network architecture and training loss function, the mapping from Deepfake or generated images to hyperparameter space allows us to obtain key information about the model characteristics used to create it.

To test this method, the MSU research team used a forged image dataset that included 100,000 synthetic images generated from 100 publicly available generative models. Each of the 100 generative models corresponds to an open-source project developed and shared by researchers across the scientific community. Some open-source projects have already released forged images, in which case the MSU research team randomly selected 1,000 images.

In cases where there were no available forged images from the open-source projects, the research team ran their released code to generate 1,000 synthetic images. Given that the test images may come from a previously unseen generative model in the real world, the research team simulated real-world applications through cross-validation to train and evaluate our model on different splits of our dataset.

Our Results

As we are the first team to perform model analysis, there are no existing comparative baselines. We randomly shuffled each hyperparameter in the ground-truth set to form a baseline called random gt. These random gt vectors maintain the original distribution.

The results indicate that the performance of our method significantly outperforms the random gt baseline. This suggests that there is indeed a stronger general correlation between the generated images and the embedding space of meaningful architectural hyperparameters and loss function types compared to random vectors of the same length and distribution. We also conducted ablation studies to demonstrate the effectiveness of fingerprint estimation and hierarchical learning.

Detection and Attribution Methods for Deepfake Reverse Engineering

The images generated from the 100 GM on the left generate estimated fingerprints, while those on the right generate corresponding spectra. Many spectra show different high-frequency signals, while some appear to be more similar.

In addition to model analysis, our FEN can also be used for Deepfake image detection and image attribution. For both tasks, we added a shallow network that inputs the estimated fingerprints and performs binary (Deepfake image detection) or multi-class classification (image attribution). Although our fingerprint estimation was not specifically tailored for these tasks, we still achieved competitive state-of-the-art results, indicating that our fingerprint estimation has excellent generalization capabilities.

Developing socially responsible AI technology has always been our priority strategy, so we adopt a human-centered research approach whenever possible. The diverse collection of Deepfake images from 100 generative models means that our model is built with representative options and has a better ability to generalize human and non-human representations.

Although some of the original images used to generate Deepfake images are images of real individuals from publicly available facial datasets, the MSU research team used Deepfake images (rather than the original images used to create them) for forensic analysis. Since this approach deconstructs Deepfake images into fingerprints, the MSU research team analyzed whether this model could map the fingerprints back to the original image content. The results indicated that this did not occur, confirming that the fingerprints mainly contain traces left by the generative model rather than the content of the original Deepfake images.

All forged facial images used in this study were generated by MSU. Relevant experiments in the reverse engineering process were also conducted at MSU. MSU will open-source the dataset, code, and training models to the broader research community to facilitate research across various fields, including Deepfake image detection, image attribution, and reverse engineering of generative models.

Significance of the Research

Our research breaks through the existing limitations of Deepfake detection by introducing the concept of model analysis that is more suitable for real-world deployment. This work will provide tools for researchers and practitioners to better investigate large-scale misinformation events initiated using Deepfake images and open new directions for future research.

MSU’s code, dataset, and training models (https://github.com/vishal3477/Reverse_Engineering_GMs?fbclid=IwAR1bZrM484AT-CDEKGiaKXbn4sLYC_Ydwt6CZWo0W8xfeFkvpWeEqzhjQxg)

Model analysis was developed in collaboration with Vishal Asnani and Xiaoming Liu from Michigan State University.

Original Link:

https://ai.facebook.com/blog/reverse-engineering-generative-model-from-a-single-deepfake-image

Detection and Attribution Methods for Deepfake Reverse Engineering

Are You Also “Watching”? 👇

Leave a Comment