Stable Diffusion Sampling Speed Doubled! Diffusion Model Sampling Algorithm Requires Only 10 to 25 Steps

Professor Zhu Jun from Tsinghua University’s Computer Science Department, leading the TSAIL team, proposed DPM-Solver (NeurIPS 2022 Oral, approximately the top 1.7%) and DPM-Solver++, which has pushed the rapid sampling algorithm of diffusion models to the extreme: achieving high-quality sampling with only 10 to 25 steps without additional training.

This year, one of the most influential advancements in the AI field is undoubtedly the explosive rise of AI image generation. Designers only need to input a textual description of an image, and AI can generate a high-resolution image of exceptional quality. Currently, the most widely used model is StabilityAI’s open-source model, Stable Diffusion, which sparked extensive discussion in the community upon its release.

However, the biggest issue with diffusion models is their extremely slow sampling speed. The model sampling starts from a pure noise image, progressively denoising it step by step until a clear image is obtained. During this process, the model must compute at least 50 to 100 steps serially to achieve a high-quality image, resulting in a generation time that is 50 to 100 times longer than other deep generative models, severely limiting the model’s deployment and practical application.

To accelerate the sampling of diffusion models, many researchers have approached from a hardware optimization perspective. For example, Google uses the JAX language to compile and run the model on TPU, while the OneFlow team [1] has achieved “one-second image generation” with their self-developed compiler for Stable Diffusion. These methods are based on the 50-step sampling algorithm PNDM[2], which experiences a significant drop in sampling quality when the number of steps is reduced.

Just a few days ago, this record was broken again! The official demo of Stable Diffusion[3] showed that the time to sample 8 images has been reduced from 8 seconds to just 4 seconds! That’s a full doubling of speed!

Stable Diffusion Sampling Speed Doubled! Diffusion Model Sampling Algorithm Requires Only 10 to 25 Steps

Moreover, the OneFlow team, based on their self-developed deep learning compiler technology, has successfully reduced the previous “one-second image generation” to “half-second image generation” without compromising sampling quality! It now takes less than 0.5 seconds on a GPU to obtain a high-definition image! Related work has been published in [1].

In fact, the core driving force behind these works comes from the DPM-Solver proposed by Professor Zhu Jun’s TSAIL team at Tsinghua University, an efficient solver specially designed for diffusion models: this algorithm requires no additional training and is applicable to both discrete and continuous-time diffusion models, achieving near convergence in 20 to 25 steps, and can obtain very high-quality sampling in just 10 to 15 steps. For Stable Diffusion, the 25-step DPM-Solver achieves better sampling quality than the 50-step PNDM, effectively doubling the sampling speed!

Project links:

DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps: https://arxiv.org/abs/2206.00927 (NeurIPS 2022 Oral)
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models: https://arxiv.org/abs/2211.01095
Project open-source code: https://github.com/LuChengTHU/dpm-solver
Project online demo: https://huggingface.co/spaces/LuChengTHU/dpmsolver_sdm

Definition and Sampling Methods of Diffusion Models

Diffusion models define a forward process that continuously adds noise to transform an image into Gaussian noise, and then define a reverse process to progressively denoise the Gaussian noise back to a clear image for sampling:

During the sampling process, based on whether additional noise is added, diffusion models can be divided into two categories: one is the diffusion stochastic differential equation model (Diffusion SDE), and the other is the diffusion ordinary differential equation (Diffusion ODE). Both models have the same training objective function, training a “noise prediction network” by minimizing the mean square error with respect to the noise: