Using Stable Diffusion Image Inpainting to Generate Your Own Object Detection Dataset

Click the above “Beginner’s Guide to Computer Vision”, select “star” or “pin”

Heavyweight content delivered promptly

Author: Rédigé par Gabriel Guerin

Translated by: ronghuaiyang Source: AI Park

Introduction

In some cases, collecting data from various scenes can be challenging. This article presents a method.

Using Stable Diffusion Image Inpainting to Generate Your Own Object Detection Dataset

Deep learning models require a significant amount of data to achieve good results, and the same goes for object detection models.

To train a YOLOv5 model to automatically detect your favorite toys, you need to take thousands of pictures of your toys in different contexts, and for each image, you need to annotate the location of the toy in the image.

This is very time-consuming.

This article proposes a method to automatically generate object detection datasets using image segmentation and stable diffusion.

Using Stable Diffusion Image Inpainting to Generate Your Own Object Detection Dataset

Pipeline for Generating Custom Datasets

The pipeline for generating object detection datasets consists of four steps:

Find a dataset that contains instances of the same object you want to recognize (e.g., a dog dataset).
Use image segmentation to generate a mask for the dog.
Fine-tune the Stable Diffusion image inpainting model.
Use the Stable Diffusion image inpainting model and the generated mask to generate data.

Image Segmentation: Generating Mask Images

The Stable Diffusion image inpainting pipeline requires an input prompt, an image, and a mask image. The model will generate new images only from the white pixel areas of the mask image.

The PixelLib library helps us perform image segmentation with just a few lines of code. In this example, we will use the PointRend model to detect dogs. Below is the code for image segmentation.

import pixellib
from pixellib.torchbackend.instance import instanceSegmentation

ins = instanceSegmentation()
ins.load_model("pointrend_resnet50.pkl")
target_classes = ins.select_target_classes(dog=True)
results, output = ins.segmentImage(
  "dog.jpg", 
  show_bboxes=True, 
  segment_target_classes=target_classes, 
  output_image_name="mask_image.jpg"
)

Using PixelLib for Image Segmentation

segmentImage function returns a tuple:

results: A dictionary containing the fields ‘boxes’, ‘class_ids’, ‘class_names’, ‘object_counts’, ‘scores’, ‘masks’, and ‘extracted_objects’.
output: The original image mixed with the mask image. Ifshow_bboxes is set toTrue, bounding boxes will also be included.

Generating Mask Images

The generated mask only contains white and black pixels, and our mask will be slightly larger than the original image of the dog to give Stable Diffusion enough space for inpainting. To achieve this effect, we translate the mask 10 pixels to the left, right, up, and down.

from PIL import Image
import numpy as np

width, height = 512, 512
image=Image.open("dog.jpg")

# Store the mask of dogs found by the PointRend model
mask_image = np.zeros(image.size)
for idx, mask in enumerate(results["masks"].transpose()):
  if results["class_names"][idx] == "dog":
    mask_image += mask


# Create a mask image bigger than the original segmented image
mask_image += np.roll(mask_image, 10, axis=[0, 0]) # Translate the mask 10 pixels to the left
mask_image += np.roll(mask_image, -10, axis=[0, 0]) # Translate the mask 10 pixels to the right
mask_image += np.roll(mask_image, 10, axis=[1, 1]) # Translate the mask 10 pixels to the bottom
mask_image += np.roll(mask_image, -10, axis=[1, 1]) # Translate the mask 10 pixels to the top


# Set non-black pixels to white pixels
mask_image = np.clip(mask_image, 0, 1).transpose() * 255
# Save the mask image
mask_image = Image.fromarray(np.uint8(mask_image)).resize((width, height))
mask_image.save("mask_image.jpg")

Generating Image Masks from PixelLib Output

Now, we have the original image of the dog and its corresponding mask.

Using Stable Diffusion Image Inpainting to Generate Your Own Object Detection Dataset

Mask Generated from Dog Image Using PixelLib

Fine-tuning the Stable Diffusion Image Inpainting Pipeline

Dreambooth is a technique for fine-tuning Stable Diffusion. We can teach the model new concepts with just a few photos, and we are preparing to use this technique to fine-tune the image inpainting model.The script train_dreambooth_inpaint.py demonstrates how to fine-tune the Stable Diffusion model on your own dataset.

Hardware Resources Required for Fine-tuning

You can usegradient_checkpointing and mixed_precision to fine-tune the model on a single 24GB GPU. For larger batch_size and faster training, at least a 30GB GPU is required.

Installing Dependencies

Before running the script, ensure these dependencies are installed:

pip install git+https://github.com/huggingface/diffusers.git
pip install -U -r requirements.txt

And initialize the acceleration environment:

accelerate config

You need to register as a user on Hugging Face Hub, and you also need a token to use this code. Run the command below to authorize your token:

huggingface-cli login

Fine-tuning Samples

Tuning hyperparameters is crucial when running these computation-intensive trainings. You need to try different parameters on the machine running the training. The recommended parameters are as follows:

$ accelerate launch train_dreambooth_inpaint.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-inpainting"  \
  --instance_data_dir="dog_images" \
  --output_dir="stable-diffusion-inpainting-toy-cat" \
  --instance_prompt="a photo of a toy cat" \
  --resolution=512 \
  --train_batch_size=1 \
  --learning_rate=5e-6 \   \
  --lr_scheduler="constant" \   \
  --lr_warmup_steps=0 \   \
  --max_train_steps=400 \
  --gradient_accumulation_steps=2 \
  --gradient_checkpointing \
  --train_text_encoder

Running the Stable Diffusion Image Inpainting Pipeline

Stable Diffusion image inpainting is a text-to-image diffusion model that uses a masked image and text input to generate realistic images. This functionality is implemented using https://github.com/huggingface/diffusers.

from PIL import Image
from diffusers import StableDiffusionInpaintPipeline


# Image and Mask
image = Image.open("dog.jpg")
mask_image = Image.open("mask_image.jpg")


# Inpainting model
pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "stable-diffusion-inpainting-toy-cat",
    torch_dtype=torch.float16,
)
image = pipe(prompt="a toy cat", image=image, mask_image=mask_image).images[0]

Running Stable Diffusion image inpainting with the fine-tuned model.

Conclusion

To summarize:

Use PixelLib for image segmentation to obtain the image mask.
Fine-tune the runwayml/stable-diffusion-inpainting model to enable it to learn the new toy cat type.
Run the fine-tuned model and the generated mask on the dog image using StableDiffusionInpaintPipeline.

Final Results

After completing all steps, we generated a new image where the toy cat replaced the original dog’s position, allowing both images to use the same bounding box.

![img](Stable Diffusion Inpainting Generate a Custom Dataset for Object Detection.assets/Capturedecran2023-01-22a23_17_25_8025faada328368a6335c61ced262d96_800.jpg)

We can now generate new images for all images in the dataset.

Limitations

Stable Diffusion does not always produce good results, and the dataset generated still requires cleaning.

This pipeline is very computationally intensive; fine-tuning Stable Diffusion requires a 24GB memory GPU, and inference also requires a GPU.

This method of constructing datasets is useful when images in the dataset are hard to obtain, for example, if you need to detect forest fires, it is better to use this method than to set fires in the forest. However, for ordinary scenes, data annotation is still the standard approach.

Using Stable Diffusion Image Inpainting to Generate Your Own Object Detection Dataset

—END—

Original article in English: https://www.sicara.fr/blog-technique/dataset-generation-fine-tune-stable-diffusion-inpainting

Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial

Reply with "OpenCV Extension Module Chinese Tutorial" in the backend of the “Beginner's Guide to Computer Vision” public account to download the first Chinese version of the OpenCV extension module tutorial, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.

Download 2: Python Visual Practical Project 52 Lectures

Reply with "Python Visual Practical Project" in the backend of the “Beginner's Guide to Computer Vision” public account to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, and face recognition, to help you quickly learn computer vision.

Download 3: OpenCV Practical Project 20 Lectures

Reply with "OpenCV Practical Project 20 Lectures" in the backend of the “Beginner's Guide to Computer Vision” public account to download 20 practical projects based on OpenCV, achieving advanced learning in OpenCV.

Group Chat

Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (which will be gradually subdivided in the future). Please scan the WeChat number below to join the group, and note: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Visual SLAM". Please follow the format for remarks, otherwise, it will not be approved. After successful addition, you will be invited to join the relevant WeChat group based on your research direction. Please do not send advertisements in the group, otherwise, you will be removed from the group. Thank you for your understanding~

Image Segmentation: Generating Mask Images

Generating Mask Images

Fine-tuning the Stable Diffusion Image Inpainting Pipeline

Hardware Resources Required for Fine-tuning

Installing Dependencies

Fine-tuning Samples

Running the Stable Diffusion Image Inpainting Pipeline

Conclusion

Final Results

Limitations

Leave a Comment Cancel reply