Research Progress on Small Target Detection Technology Based on Deep Learning

Author: Liu Genghuan

Paper Title: Research Progress on Small Target Detection Technology Based on Deep Learning (Invited)

Authors: Liu Genghuan^1,2,3, Zeng Xiangjin^1,2,3, Dou Jiazhen^1,2,3, Ren Zhenbo⁴, Zhong Liyun^1,2,3, Di Janglei^1,2,3, Qin Yuwen^1,2,3

Affiliation:

1. Guangdong University of Technology, School of Information Engineering, Advanced Photonic Technology Research Institute

2. Key Laboratory of Perceptual Fusion Photonic Technology, Ministry of Education

3. Key Laboratory of Information Photonic Technology, Guangdong Province

4. Northwestern Polytechnical University, School of Physics Science and Technology

Introduction

Small target detection plays an important role in fields such as autonomous driving, security monitoring, and medical image processing. However, due to factors such as the indistinct visual features of small targets, interference from complex backgrounds, and low signal-to-noise ratios, current detection technologies still face significant challenges. To address these issues, researchers are continuously exploring new methods to enhance the accuracy and robustness of small target detection.

Figure 1: Applications of Small Target Detection

In recent years, the rapid development of deep learning technology has brought new opportunities for small target detection. This article systematically reviews the current small target detection technologies based on deep learning methods, categorizing, analyzing, and comparing existing algorithms: defining the concept of small target detection and summarizing the main challenges faced in small target detection; focusing on several mainstream detection networks and their optimization strategies, such as using data augmentation techniques to improve model generalization, enhancing small target visibility through super-resolution techniques, employing multi-scale information fusion technology to enhance detection accuracy, and improving feature expression capabilities based on contextual information learning and large kernel convolution strategies, anchor-free detection mechanisms, DETR technology, and multimodal small target detection for specific application scenarios, while analyzing the advantages and disadvantages of several strategies in detail, providing a basis for further development of small target detection technology.

Research Background

The small size of small targets or the limited number of pixels they occupy in images leads to blurred edge information and a lack of semantic information. Additionally, downsampling steps in the base network further reduce the information content of small targets, resulting in ineffective extraction of deep features, insufficient representation of features for small targets, and even failure to pass them to the target detector, leading to failed detection tasks. This loss of information has little impact on medium and large target detection, but it poses a significant challenge for small target detection. Especially in complex imaging backgrounds, small targets are also affected by cloud cover, changes in lighting, and other targets, resulting in low signal-to-noise ratios, weak effective signals, and incomplete target textures, making small target detection even more difficult.

Figure 2: Low Signal-to-Noise Ratio and Low Detectability Caused by Complex Background

Moreover, the boundary box localization of small targets is more challenging, as pixel-level offsets have a greater impact on small target detection than on targets of conventional sizes. Existing detection algorithms lack generality in cross-domain applications, and detection under one imaging system and target is often difficult to transfer to other systems and targets. At the same time, dedicated datasets have significant limitations in scale and distribution, and the lack of general small target detection datasets further restricts the improvement of algorithm detection performance and the advancement of small target detection technology.

Figure 3: Low Tolerance of Small Targets to Boundary Box Perturbations (Top Left, Bottom Left, and Right Represent Small, Medium, and Large Targets, Respectively. Black Represents the True Box, While Blue and Red Represent Slightly Offset Prediction Boxes Along Diagonal Directions)

Main Content

The core challenges of small target detection technology include insufficient feature expression capability of detectors for small target features, interference from complex backgrounds, and difficulties in boundary box localization. To address these issues, existing research has proposed several key technical strategies:

Data Augmentation Techniques: In the case of limited small target samples, employing geometric transformations, random occlusions, and copy augmentation can effectively increase the number of small target samples and improve the model’s generalization ability. For example, techniques like CutOut and MixUp enhance model robustness by occluding parts of images or linearly combining images.

Super-Resolution Techniques: By enhancing image resolution, the visibility of small targets can be improved to some extent. Common methods include interpolation algorithms based on convolutional neural networks and Generative Adversarial Networks (GANs). Notably, GAN technology effectively addresses issues such as the mosaic effect caused by traditional methods by generating high-resolution images end-to-end.

Multi-Scale Feature Perception and Fusion: Due to the small size and limited information of small targets, employing multi-scale feature perception methods for detecting targets of different scales has become an effective solution. Common multi-scale feature networks, such as Feature Pyramid Networks (FPN), combine shallow features from bottom-up and deep features from top-down, allowing effective extraction of features across all scales, enhancing the feature expression capability of small targets.

Contextual Information Learning: Visual targets often appear in specific environments and may coexist with other related targets, such as birds in the sky or ships on the water. Therefore, leveraging the relationships between targets and environments or between targets can enhance target recognition. This prior knowledge, which utilizes semantic and spatial relationships, is known as “contextual” information, which can provide clues about the surrounding areas of small targets, offering additional feature information for the targets. In some cases, especially in complex backgrounds, rich contextual information can be even more critical than the features of small targets themselves.

Application of Large Kernel Convolutions: Due to the small size of small targets, they are often difficult to identify accurately based solely on appearance. Successful identification of these small targets typically relies on their context. For instance, detecting remote sensing images often requires extensive contextual information. Large convolutional kernels, due to their large receptive fields, can introduce rich contextual information for small target recognition, thus improving detection accuracy.

Anchor-Free Detection Mechanisms: Traditional anchor box methods struggle to handle the diversity of small targets. Anchor-free methods treat target detection as a keypoint estimation problem, reducing hyperparameters and complex computations. Common methods include CornerNet, which locates targets by matching left and right corner points, CenterNet, which adds center point detection based on CornerNet, ExtremeNet, which locates targets using four extreme points and a center point, and FCOS, which avoids the complexity of anchor box design through pixel-level predictions.

Figure 4: Four Anchor-Free Detection Mechanisms. (a) CornerNet; (b) CenterNet; (c) ExtremeNet; (d) FCOS

DETR: DETR is an end-to-end object detection model based on Transformers. By encoding global context, DETR avoids complex post-processing steps and has good generalization capabilities. However, due to the high computational complexity of the self-attention mechanism and issues with data imbalance, DETR converges slowly and has low real-time performance.

Multimodal Small Target Detection: Combining visible light and infrared images for dual-modal detection can effectively enhance detection performance in complex scenarios, especially under low light or adverse weather conditions, where infrared images can effectively compensate for the shortcomings of visible light images, providing richer target feature information and thus improving small target detection results. The core issue in multimodal target detection is how to extract and fuse image information from the two modalities. Depending on the fusion stage, image fusion methods can be classified into early fusion, mid-fusion, late fusion, and confidence fusion.

Figure 5: Four Image Fusion Strategies. (a) Early Fusion; (b) Mid Fusion; (c) Late Fusion; (d) Confidence Fusion

Overall, the above strategies effectively enhance the accuracy and robustness of small target detection through network structure optimization, data sample augmentation, and multimodal information fusion, providing new research directions for further improving small target detection performance.

Conclusion

Small target detection has always been an important issue and challenge in the field of object detection. Despite significant progress in the field of small target detection in recent years through methods such as multi-scale feature fusion, data augmentation, super-resolution techniques, anchor-free detection, large kernel convolution, and DETR, small target detection still faces many difficulties compared to medium and large target detection. The small proportion of small targets in images, limited information, and interference from complex backgrounds leave considerable room for improvement in detection accuracy and robustness.

Based on this, small target detection algorithms can be further optimized in the following aspects. Firstly, feature fusion should enhance key features of small targets while maintaining computational efficiency and reducing noise interference. Secondly, contextual learning should optimize the selection and completion of contextual information to avoid redundant information interfering with detection. Large kernel convolution should reduce computational burden through structural optimization to enhance model real-time performance. Finally, DETR faces issues such as complex training and high computational costs, and future efforts should focus on enhancing its real-time detection performance through the introduction of sparsity strategies and lightweight models.

With the continuous advancement of deep learning technology and hardware performance, accurate, efficient, and robust small target detection algorithms will demonstrate greater application potential and value in fields such as autonomous driving, security monitoring, and medical diagnostics.

This article is reproduced from the “Optoelectronic e+” WeChat public accountCopyright belongs to the“Optoelectronic e+“ WeChat public account (Xi’an Institute of Applied Optics, Wang Xiaojing, Editor)

Leave a Comment Cancel reply