Is the Diffusion Model Making GANs Obsolete?

Click belowCard, follow the “CVer” public account

AI/CV heavy content delivered to you first

Click to enter —> CV WeChat Technical Group

Jin Lei Alex from Aofeisi, reprinted from: QbitAI

The once popular GAN is outdated.

Recently, a tweet from Tom Goldstein, an associate professor at the University of Maryland, has stirred up a lot of discussion.

Even big names in the tech circle have come to pay attention:

Is the Diffusion Model Making GANs Obsolete?

The keyword at the center of the topic is the Diffusion Model, which, in Tom’s words, was:

In 2021, it could even be said to be unheard of.

However, this algorithm is not unfamiliar, as it is the core of the AI drawing tool DALL·E.

Moreover, the authors of DALL·E never approved of GAN from the start and directly abandoned it.

Coincidentally, this same topic has sparked considerable discussion domestically:

Is the Diffusion Model Making GANs Obsolete?

So why is this wave of “new generation pushing the old” happening in the field of image generation?

Let’s explore this.

What is the Diffusion Model?

The spotlight on the Diffusion Model this time can be attributed to the popularity of various “AI one-sentence image generation” tools.

For example, OpenAI’s DALL·E 2:

Is the Diffusion Model Making GANs Obsolete?

Google’s Imagen:

Is the Diffusion Model Making GANs Obsolete?

It is not difficult to see that these recently popular image generation tools, whether in terms of realism or imagination and understanding ability, align well with human expectations.

Thus, they have become the “new favorites” among netizens (just as GAN was when it debuted).

The key behind such capabilities is the Diffusion Model.

Its research can be traced back to 2015 when researchers from Stanford and Berkeley published a paper titled Deep Unsupervised Learning using Nonequilibrium Thermodynamics:

Is the Diffusion Model Making GANs Obsolete?

However, this research is very different from the current Diffusion Model; the research that truly made it functional was in 2020, with a study titled Denoising Diffusion Probabilistic Models:

Is the Diffusion Model Making GANs Obsolete?

Let’s first look at the comparison between various generative models:

Is the Diffusion Model Making GANs Obsolete?

It is evident that the Diffusion Model differs from other models in that its latent code (z) is the same size as the original image.

To simply summarize the Diffusion Model, it consists of a series of Gaussian noise (T rounds) that transforms the input image x0 into pure Gaussian noise xT.

Breaking it down further, the Diffusion Model first includes a forward process (Forward diffusion process).

The purpose of this process is to add noise to the image; however, image generation cannot be achieved at this step.

Is the Diffusion Model Making GANs Obsolete?

Next is a reverse process (Reverse diffusion process), which can be understood as the denoising inference process of Diffusion.

Is the Diffusion Model Making GANs Obsolete?

Finally, in the training phase, it is done by maximizing the log likelihood of the model’s predicted distribution under the real data distribution.

The above process is based on the DDPM research.

However, a Zhihu user “I Want to Sing High C” (Dr. TSAIL) believes:

When DDPM was proposed, researchers in the field did not fully understand the mathematical principles behind this model, so the descriptions in the article did not explore the more essential mathematical principles.

In his view, it was not until Stanford’s Yang Song and others revealed the mathematical background corresponding to the continuous version of the diffusion model in Score-Based Generative Modeling through Stochastic Differential Equations that the connection between the denoising score matching method in statistical machine learning and the denoising training in DDPM was unified.

Is the Diffusion Model Making GANs Obsolete?

More detailed processes can be referenced in the paper details linked at the end.

Next, a question that needs to be discussed is:

Why is GAN Being Replaced So Quickly?

According to a paper from OpenAI, the image quality generated by the Diffusion Model is significantly better than that of the GAN model.

DALL·E is a multimodal pre-trained large model, and the terms “multimodal” and “large” indicate that the dataset used to train this model is very large and diverse.

Professor Tom Goldstein, who published this tweet, mentioned that a challenge in the training process of the GAN model is how to determine the optimal weights of the numerous saddle-point loss functions, which is actually a quite complex mathematical problem.

Is the Diffusion Model Making GANs Obsolete?

In the training process of multilayer deep learning models, feedback is required multiple times until the model converges.

However, in practice, it is found that the loss function often cannot reliably converge to the saddle point, leading to poor model stability. Even though some researchers have proposed techniques to enhance the stability of the saddle point, it is still insufficient to resolve this issue.

Especially when faced with more complex and diverse data, handling the saddle point becomes increasingly difficult.

Unlike GAN, DALL·E uses the Diffusion Model, which does not need to get entangled in saddle point issues; it only needs to minimize a standard convex cross-entropy loss, and people already know how to stabilize it.

This greatly simplifies the difficulty of data handling in the model training process. In simpler terms, it overcomes an obstacle from a novel mathematical paradigm.

Additionally, during the training process, the GAN model requires not only a “generator” to map sampled Gaussian noise to the data distribution, but also an additional discriminator, which complicates the training process.

Compared to GAN, the Diffusion Model only needs to train the “generator”; the objective function is simple, and there is no need to train other networks (discriminators, posterior distributions, etc.), instantly simplifying a lot of things.

Current training techniques allow the Diffusion Model to directly skip the tuning phase of GANs and can be used directly for downstream tasks.

Is the Diffusion Model Making GANs Obsolete?
Visual Representation of the Diffusion Model

Theoretically, the success of the Diffusion Model lies in the fact that the trained model only needs to “mimic” a simple forward process corresponding to the reverse process, without needing to “black box” search for the model like other models.

Moreover, each small step of this reverse process is very simple, only requiring fitting with a simple Gaussian distribution (q(x(t-1)| xt)).

This brings many conveniences to the optimization of the Diffusion Model, which is one of the reasons for its excellent empirical performance.

Is the Diffusion Model Perfect?

Not necessarily.

From a trend perspective, the Diffusion Model field is indeed flourishing, but as “I Want to Sing High C” stated:

There are still some core theoretical issues in this field that need to be researched, which provides valuable research content for those of us doing theoretical work. > Moreover, even for those not interested in theoretical research, since this model is already quite effective, its combination with downstream tasks is just beginning, and there are many areas that can be quickly occupied.

I believe that the accelerated sampling of the Diffusion Model will definitely be resolved in the near future, allowing the Diffusion Model to dominate the field of deep generative models.

Regarding the effectiveness of the Diffusion Model and its rapid replacement of GAN, Professor Ma Yi believes it fully illustrates a principle:

A few simple and correct mathematical derivations can be much more effective than nearly a decade of large-scale hyperparameter tuning and network structure adjustments.

However, regarding this “new generation pushing the old” fervor, Professor Ma Yi has a different perspective:

I hope young researchers correct their research purposes and attitudes and not be misled by currently popular topics.

Including the Diffusion Process, this is actually an old idea that has been around for hundreds of years, just finding new applications.

“I Want to Sing High C” Zhihu Answer:

https://www.zhihu.com/question/536012286/answer/2533146567

Reference Links:

[1]https://twitter.com/tomgoldsteincs/status/1560334207578161152?s=21&t=QE8OFIwufZSTNi5bQhs0hQ[2]https://www.zhihu.com/question/536012286[3]https://arxiv.org/pdf/2105.05233.pdf[4]https://arxiv.org/abs/1503.03585[5]https://arxiv.org/abs/2006.11239[6]https://arxiv.org/abs/2011.13456[7]https://weibo.com/u/3235040884?topnav=1&wvr=6&topsug=1&is_all=1

Click to enter —> CV WeChat Technical Group

Download CVPR 2022 Papers and Code



Reply in the background: CVPR2022 to download the CVPR 2022 papers and code open-source paper collection
Reply in the background: Transformer Review to download the latest 3 Transformer review PDFs

Target detection and Transformer group established

Scan the QR code below, or add WeChat: CVer6666 to add CVer assistant WeChat, and you can apply to join CVer-target detection or Transformer WeChat group. Additionally, other vertical directions covered include: target detection, image segmentation, object tracking, face detection & recognition, OCR, pose estimation, super-resolution, SLAM, medical imaging, Re-ID, GAN, NAS, depth estimation, autonomous driving, reinforcement learning, lane detection, model pruning & compression, denoising, dehazing, deraining, style transfer, remote sensing images, behavior recognition, video understanding, image fusion, image retrieval, paper submission & communication, PyTorch, TensorFlow, and Transformer, etc.

Be sure to note: research direction + location + school/company + nickname (e.g. target detection or Transformer + Shanghai + SJTU + Kaka), based on the format noted, it can be approved faster and invited into the group.

▲ Scan or add WeChat: CVer6666 to join the group.

CVer academic group (Knowledge Planet) is here! If you want to understand the latest, fastest, and best CV/DL/ML paper delivery, quality open-source projects, learning tutorials, and practical training materials, feel free to scan the QR code below and join the CVer academic group, which has gathered thousands of people!

▲ Scan to join the group

▲ Click the card above to follow the CVer public account
Organizing is not easy, please like and share.

Leave a Comment