Jin Lei Alex from Aofeisi Quantum Bit | WeChat Official Account QbitAI
The once-popular GAN is outdated.
Tom Goldstein, an associate professor at the University of Maryland, recently published a tweet that stirred up a lot of discussion.
Even big names in the tech world have come to pay attention:
The keyword that the topic “points to” is the Diffusion Model, which in Tom’s words:
In 2021, it could even be said to be unheard of.
But this algorithm is not unfamiliar, as it is the core of the AI painting tool DALL·E.
Moreover, the authors of DALL·E never really “liked” GAN and directly abandoned it.
Coincidentally, the same topic has sparked considerable discussion domestically:
So what exactly is driving this wave of “new waves pushing old waves” in image generation?
Let’s take a look.
What is the Diffusion Model?
The recent spotlight on the Diffusion Model is largely thanks to the popularity of various “AI one-sentence image generation” tools.
For example, OpenAI’s DALL·E 2:
Google’s Imagen:
It is not difficult to see that these recently trending image generation tools, whether in terms of realism or imagination and understanding capabilities, meet human expectations quite well.
Therefore, they have become the “new favorites” for this generation of internet users (just like when GAN debuted, it was also played to death).
The key behind such capabilities is the Diffusion Model.
Its research can be traced back to 2015, when researchers from Stanford and Berkeley published a paper titled Deep Unsupervised Learning using Nonequilibrium Thermodynamics:
However, this study is very different from the current Diffusion Model; the research that truly made it work was in 2020, a study named Denoising Diffusion Probabilistic Models:
Let’s first take a look at the comparison between various generative models:
It is clear that the Diffusion Model differs from other models in that its latent code (z) is the same size as the original image.
Simply put, the Diffusion Model involves a series of Gaussian noise (T rounds) that transforms the input image x0 into pure Gaussian noise xT.
Breaking it down further, the Diffusion Model first includes a forward process (Forward diffusion process).
The purpose of this process is to add noise to the image; however, at this stage, image generation cannot yet be achieved.
Next is a reverse process (Reverse diffusion process), which can be understood as the denoising inference process of Diffusion.
Finally, in the training phase, the model maximizes the log likelihood of the model prediction distribution under the real data distribution.
The above process is based on the DDPM study.
However, a Zhihu user “I Want to Sing High C” (TSAIL PhD) believes:
When DDPM was proposed, researchers in the field did not fully understand the mathematical principles behind this model, so the descriptions in the paper did not explore the more essential mathematical principles.
In his view, it was not until Stanford University’s Yang Song et al. revealed the mathematical background corresponding to the continuous version of the diffusion model in Score-Based Generative Modeling through Stochastic Differential Equations.
Moreover, it unified the denoising score matching method in statistical machine learning with the denoising training in DDPM.
More detailed processes can be found in the paper details linked at the end.
Next, we need to explore a question:
Why is GAN Being Replaced So Quickly?
Using a paper from OpenAI, the image quality generated by the Diffusion Model is significantly better than that of the GAN model.
DALL·E is a multimodal pre-trained large model, and the terms “multimodal” and “large” indicate that the dataset used to train this model is vast and diverse.
Professor Tom Goldstein, who published this tweet, mentioned that a challenge in training the GAN model is how to determine the optimal weights for numerous loss function saddle points, which is actually a fairly complex mathematical problem.
In the training process of multilayer deep learning models, multiple feedbacks are required until the model converges.
However, in practice, it is found that loss functions often cannot reliably converge to saddle points, leading to poor model stability. Even though some researchers have proposed techniques to enhance saddle point stability, it is still insufficient to solve this problem.
Especially when facing more complex and diverse data, handling saddle points becomes increasingly difficult.
Unlike GAN, DALL·E uses the Diffusion Model, which does not have to struggle with saddle point issues; it only needs to minimize a standard convex cross-entropy loss (convex cross-entropy loss), and it is known how to make it stable.
This greatly simplifies the difficulty of data processing during the model training process. In simple terms, it uses a new mathematical paradigm to overcome an obstacle from a novel perspective.
Additionally, during the training process, the GAN model requires not only a “generator” to map the sampled Gaussian noise to the data distribution; it also requires an additional trained discriminator, which complicates the training process.
In contrast to GAN, the Diffusion Model only needs to train the “generator”; the training objective function is simple and does not require training other networks (discriminator, posterior distribution, etc.), which simplifies many aspects instantly.
Current training techniques allow the Diffusion Model to directly bypass the stage of tuning models in the GAN field and can be used directly for downstream tasks.

△Visual Representation of the Diffusion Model
From a theoretical perspective, the success of the Diffusion Model lies in the fact that the trained model only needs to “mimic” a simple forward process corresponding to the reverse process, without needing to “black box” search for the model like other models.
Moreover, each small step of this reverse process is very simple, requiring only a simple Gaussian distribution (q(x(t-1)| xt)) for fitting.
This brings many conveniences to the optimization of the Diffusion Model, which is one of the reasons for its excellent empirical performance.
Is the Diffusion Model Perfect?
Not necessarily.
From a trend perspective, the Diffusion Model field is indeed flourishing, but as “I Want to Sing High C” stated:
There are still some core theoretical issues in this field that need to be researched, which provides valuable research content for those of us working on theory. Moreover, even for those not interested in theoretical research, since this model already works well, its combination with downstream tasks has only just begun, and there are many opportunities to seize quickly.
I believe the accelerated sampling of the Diffusion Model will certainly be completely resolved in the near future, allowing the Diffusion Model to dominate the field of deep generative models.
Regarding the effectiveness of the Diffusion Model and its quick replacement of GAN, Professor Ma Yi believes it sufficiently illustrates a principle:
A few simple and correct mathematical derivations can be far more effective than nearly a decade of large-scale hyperparameter tuning and network structure debugging.
However, regarding this hot topic of “new waves pushing old waves”, Professor Ma Yi also has a different perspective:
I hope young researchers maintain the right purpose and attitude towards research and not be misled by currently popular topics.
Including the Diffusion Process, this is actually an old idea that has been around for hundreds of years; it is just an old tree sprouting new buds, finding new applications.
“I Want to Sing High C” Zhihu Answer:
https://www.zhihu.com/question/536012286/answer/2533146567
Reference Links:
[1]https://twitter.com/tomgoldsteincs/status/1560334207578161152?s=21&t=QE8OFIwufZSTNi5bQhs0hQ[2]https://www.zhihu.com/question/536012286[3]https://arxiv.org/pdf/2105.05233.pdf[4]https://arxiv.org/abs/1503.03585[5]https://arxiv.org/abs/2006.11239[6]https://arxiv.org/abs/2011.13456[7]https://weibo.com/u/3235040884?topnav=1&wvr=6&topsug=1&is_all=1
— The End —
Join the WeChat group for “Artificial Intelligence” and “Smart Cars”!
Welcome to join us for those interested in artificial intelligence and smart cars, to communicate and discuss with AI practitioners, and not to miss the latest industry developments & technical progress.
ps. Please be sure to note your name-company-position when adding friends~
Click here👇 to follow me, and remember to star it~
One-click three connections: “Share”, “Like”, and “View”
Technology frontier progress seen every day~