YOLT: A YOLO-Based Framework for Large-Scale Image Recognition

This article shares a satellite image object detection framework based on an improved YOLO v2 — YOLT, which provides a good idea for many friends facing challenges in large satellite image recognition during the recent Amazon Web Services 【AI For Good – 2022】 competition. Currently, the YOLT framework has been updated to v4 and is open-sourced on GitHub.

Open Source Address:

https://github.com/avanetten/yolt

YOLT: A YOLO-Based Framework for Large-Scale Image Recognition

Click to read the original article and register for the Amazon Web Services 【AI For Good – 2022】 competition

Introduction

Satellite remote sensing image object detection is currently a key application direction of CV technology. The main difference between satellite image detection and general image detection is that the image size is enormous (e.g., 10000 x 10000), while the target size is extremely small (e.g., 10 x 10) and usually clustered together, making it challenging for general object detection algorithms to recognize.

In response to these issues, YOLT modifies the YOLO v2 framework to propose a feasible approach suitable for satellite image detection.

Core Concepts of YOLT

In the related paper “You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery”, the YOLT developers list common difficulties in satellite image recognition and corresponding solutions (as shown in the figure below).

YOLT: A YOLO-Based Framework for Large-Scale Image Recognition

▲ The left side shows common problems, and the right side shows corresponding solutions

★For the issues of irregular object sizes and diverse orientations, YOLT performs data augmentation such as size transformation and rotation on satellite image data.

★For the problem of excessively small and clustered targets, the YOLT framework mainly adopts three methods:

Modifying the image network structure by changing the stride in the YOLO v2 framework from 32 to 16, which is beneficial for detecting targets smaller than 32 x 32.
Upsampling the image to complete the “decompression” operation, i.e., enlarging the original image to facilitate the detection of small and dense objects.
Integrating detection models of different sizes, i.e., Ensemble operation, since the size differences of different targets can be significant, such as ports and ships, airports and airplanes, Ensemble operation can improve recognition accuracy under large size differences.

★For the problem of excessively large satellite image sizes, YOLT adopts a tiling method, cutting the original image into smaller pieces before inputting them into the model for training and combining with methods 2-3 for model integration.

YOLT: A YOLO-Based Framework for Large-Scale Image Recognition

▲ The network structure of YOLT, where the output feature size is mostly 26 x 26, which can enhance detection accuracy.

Iterations and Detection Examples

The earliest version of YOLT was open-sourced in 2018 and has now been updated to YOLT v4, which has higher recognition accuracy and faster recognition speed compared to the initial YOLT framework.

Open Source Address: https://github.com/avanetten/yoltv4

From the detection examples below, we can see how YOLT works:

YOLT: A YOLO-Based Framework for Large-Scale Image Recognition

First, the development team resized a satellite image to 416 x 416 (as shown on the left), and found that it could not detect vehicle targets; however, cutting a 416 x 416 area from the original image (referred to as Chips) allowed for partial vehicle target detection.

YOLT: A YOLO-Based Framework for Large-Scale Image Recognition

Following this idea, the development team used a sliding window approach to cut the original image into many chips, ensuring some overlap between adjacent chips (as shown in the above image) to ensure the completeness of image detection, then using NMS algorithm to filter out duplicate detections, and finally merging the detection results of each piece to obtain the final result.

YOLT: A YOLO-Based Framework for Large-Scale Image Recognition

▲ Detection Example: Using YOLT v4 to identify airplanes at the airport

Conclusion

The YOLT framework is an improved application of the classic CV framework YOLO, mainly addressing the challenges of large satellite remote sensing image sizes and small target sizes through feasible tricks such as tiling, upsampling, and model integration. YOLT is also currently the only open-source satellite image recognition algorithm framework. Although its accuracy and speed still have a certain gap compared to large remote sensing image recognition software, the approach of processing raw data to reduce the difficulty of CNN model processing can still inspire large image processing.

This idea can also be applied in the ongoing Amazon Web Services 【AI For Good – 2022】 competition. If you are interested in remote sensing image recognition or wish to practice the application of the YOLT framework, you can scan the QR code below or click to read the original article to register.

YOLT: A YOLO-Based Framework for Large-Scale Image Recognition

This competition not only offers high bonuses and related rights to outstanding teams but also invites the Top 10 teams to participate in the Amazon Web Services 2022 Summit conference for live presentations, allowing for face-to-face technical exchanges with experts in the CV field. Additionally, winning works will receive promotion through Amazon’s official media channels, helping CV professionals solve technical problems and clear career obstacles.

Introduction

Core Concepts of YOLT

Conclusion

Leave a Comment Cancel reply