Reprinted from | Quantum Bits
Author | Alex
The free and open-source Stable Diffusion has been creatively used to compress images.
This time, it is used for image compression.
Stable Diffusion can not only reduce the size of the same original image, but it also performs visibly better than JPEG and WebP.
For the same original image, the image compressed by Stable Diffusion not only has more details but also has fewer compression artifacts.
However, software engineer Matthias Bühlmann (let’s call him MB) pointed out that this method has significant limitations.
Because it is not very good at handling faces and text, sometimes it even generates features that do not exist in the original image after decoding and expanding back.
For example, like this (the effect can be startling):
△ Left is the original image, right is the generated image after Stable Diffusion compression and expansion
However, that said—
1
How Does Stable Diffusion Compress Images?
To clarify how Stable Diffusion compresses images, let’s start with some important principles of Stable Diffusion.
Stable Diffusion is a special kind of diffusion model called Latent Diffusion.
Unlike Standard Diffusion, Latent Diffusion performs the diffusion process in a lower-dimensional latent space rather than using the actual pixel space.
That is to say, the representation results in the latent space are some lower-resolution compressed images, but these images have a higher accuracy.
Here, it is important to note that the resolution of an image and its accuracy are two different things. Resolution refers to the amount of data in an image, while accuracy reflects how close the result is to the true value.
Take this camel’s close-up photo as an example: the original image size is 768KB, with a resolution of 512×512, and an accuracy of 3×8 bits.
After compressing with Stable Diffusion to 4.98KB, the resolution is reduced to 64×64, while the accuracy is actually improved to 4×32 bits.
So it appears that the compressed image from Stable Diffusion does not differ much from the original image.
To be more specific, the Latent Diffusion model has three main components:
VAE (Variational Auto Encoder), U-Net, and Text-encoder.
However, in this image compression test, the text encoder is not very useful.
The main role is played by the VAE, which consists of an encoder and a decoder.
Therefore, the VAE can encode an image from the image space and decode it to obtain some latent space representation.
MB found that the decoding function of the VAE is very stable for quantizing latent representations.
By scaling, dragging, and remapping, the latent representation can be quantized from floating-point to 8-bit unsigned integers, resulting in a minimally distorted compressed image:
First, the latents are quantized to 8-bit unsigned integers, at this point, the image size is 64×64×4×8Bit=16 kB (original image size 512×512×3×8Bit=768 kB).
Then, using a palette and dithering, the data is further reduced to 5kB while improving the image fidelity.
As a meticulous programmer, MB not only observed with the naked eye but also conducted data analysis on image quality.
However, based on two important metrics for image quality assessment, PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity), the compression results of Stable Diffusion are not significantly better than JPG and WebP.
Additionally, when the latent representation is re-decoded and expanded to the original image resolution, although the main features of the image are still visible, the VAE will also assign high-resolution features to these pixel values.
In simpler terms, the reconstructed image often differs from the original image, containing many newly generated “grotesque” features.
Let’s review this image again:
Although using Stable Diffusion to compress images indeed has many issues, in MB’s words, its effects are still quite stunning and have great potential for development.
Now MB has put the relevant code on Google Colab, and interested friends can take a closer look~
Link: https://colab.research.google.com/drive/1Ci1VYHuFJK5eOX9TB0Mq4NsqkeDrMaaH?usp=sharingReference links: [1] https://arstechnica.com/information-technology/2022/09/better-than-jpeg-researcher-discovers-that-stable-diffusion-can-compress-images/[2] https://matthias-buehlmann.medium.com/stable-diffusion-based-image-compresssion-6f1f0a399202[3] https://huggingface.co/blog/stable_diffusion
Technical Group Invitation
Scan the QR code to add the assistant’s WeChat