Stable Diffusion Can Compress Images: Smaller Than JPEG, Clearer to the Eye

Stable Diffusion Can Compress Images: Smaller Than JPEG, Clearer to the Eye

MLNLP community is a well-known machine learning and natural language processing community both domestically and internationally, covering NLP postgraduate students, university teachers, and corporate researchers.
Community Vision is to promote communication and progress between the academic and industrial sectors of natural language processing and machine learning, especially for beginners.

Reprinted from | Quantum Bits

Author | Alex

The free and open-source Stable Diffusion has been creatively used to compress images.

This time, it is used for image compression.

Stable Diffusion can not only reduce the size of the same original image, but it also performs visibly better than JPEG and WebP.

Stable Diffusion Can Compress Images: Smaller Than JPEG, Clearer to the Eye

For the same original image, the image compressed by Stable Diffusion not only has more details but also has fewer compression artifacts.

However, software engineer Matthias Bühlmann (let’s call him MB) pointed out that this method has significant limitations.

Because it is not very good at handling faces and text, sometimes it even generates features that do not exist in the original image after decoding and expanding back.

For example, like this (the effect can be startling):

Stable Diffusion Can Compress Images: Smaller Than JPEG, Clearer to the Eye

△ Left is the original image, right is the generated image after Stable Diffusion compression and expansion

However, that said—

1

How Does Stable Diffusion Compress Images?

To clarify how Stable Diffusion compresses images, let’s start with some important principles of Stable Diffusion.

Stable Diffusion is a special kind of diffusion model called Latent Diffusion.

Unlike Standard Diffusion, Latent Diffusion performs the diffusion process in a lower-dimensional latent space rather than using the actual pixel space.

That is to say, the representation results in the latent space are some lower-resolution compressed images, but these images have a higher accuracy.

Here, it is important to note that the resolution of an image and its accuracy are two different things. Resolution refers to the amount of data in an image, while accuracy reflects how close the result is to the true value.

Take this camel’s close-up photo as an example: the original image size is 768KB, with a resolution of 512×512, and an accuracy of 3×8 bits.

After compressing with Stable Diffusion to 4.98KB, the resolution is reduced to 64×64, while the accuracy is actually improved to 4×32 bits.

So it appears that the compressed image from Stable Diffusion does not differ much from the original image.

Stable Diffusion Can Compress Images: Smaller Than JPEG, Clearer to the Eye

To be more specific, the Latent Diffusion model has three main components:

VAE (Variational Auto Encoder), U-Net, and Text-encoder.

However, in this image compression test, the text encoder is not very useful.

The main role is played by the VAE, which consists of an encoder and a decoder.

Therefore, the VAE can encode an image from the image space and decode it to obtain some latent space representation.

MB found that the decoding function of the VAE is very stable for quantizing latent representations.

By scaling, dragging, and remapping, the latent representation can be quantized from floating-point to 8-bit unsigned integers, resulting in a minimally distorted compressed image:

First, the latents are quantized to 8-bit unsigned integers, at this point, the image size is 64×64×4×8Bit=16 kB (original image size 512×512×3×8Bit=768 kB).

Then, using a palette and dithering, the data is further reduced to 5kB while improving the image fidelity.

Stable Diffusion Can Compress Images: Smaller Than JPEG, Clearer to the Eye

As a meticulous programmer, MB not only observed with the naked eye but also conducted data analysis on image quality.

However, based on two important metrics for image quality assessment, PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity), the compression results of Stable Diffusion are not significantly better than JPG and WebP.

Additionally, when the latent representation is re-decoded and expanded to the original image resolution, although the main features of the image are still visible, the VAE will also assign high-resolution features to these pixel values.

In simpler terms, the reconstructed image often differs from the original image, containing many newly generated “grotesque” features.

Let’s review this image again:

Stable Diffusion Can Compress Images: Smaller Than JPEG, Clearer to the Eye

Although using Stable Diffusion to compress images indeed has many issues, in MB’s words, its effects are still quite stunning and have great potential for development.

Now MB has put the relevant code on Google Colab, and interested friends can take a closer look~

Link: https://colab.research.google.com/drive/1Ci1VYHuFJK5eOX9TB0Mq4NsqkeDrMaaH?usp=sharingReference links: [1] https://arstechnica.com/information-technology/2022/09/better-than-jpeg-researcher-discovers-that-stable-diffusion-can-compress-images/[2] https://matthias-buehlmann.medium.com/stable-diffusion-based-image-compresssion-6f1f0a399202[3] https://huggingface.co/blog/stable_diffusion

Technical Group Invitation

Stable Diffusion Can Compress Images: Smaller Than JPEG, Clearer to the Eye

△ Long press to add assistant

Scan the QR code to add the assistant’s WeChat

Please note: Name-School/Company-Research Direction
(e.g., Xiao Zhang-Harbin Institute of Technology-Dialogue System)
to apply to join Natural Language Processing/Pytorch and other technical groups

About Us

MLNLP community is a grassroots academic community jointly built by machine learning and natural language processing scholars from both domestic and abroad. It has developed into a well-known machine learning and natural language processing community aimed at promoting progress between academia, industry, and enthusiasts in machine learning and natural language processing.
The community can provide an open communication platform for further study, employment, and research for related practitioners. Everyone is welcome to follow and join us.

Stable Diffusion Can Compress Images: Smaller Than JPEG, Clearer to the Eye

Leave a Comment