Faster R-CNN Model and Deep Learning Environment Setup

1. Faster R-CNN Model

The R-CNN series networks are the most classic networks in the field of object detection, and their model update ideas are easy to understand. The object detection process is divided into three stages: candidate box generation, feature extraction, classification, and regression. R-CNN is a detection network assembled from many modules, where only the intermediate feature extraction uses deep neural networks. Although accuracy is guaranteed, the speed is very slow. With the Fast R-CNN network, both feature extraction and classification regression are implemented using deep neural networks, significantly improving speed while slightly enhancing accuracy. Finally, Faster R-CNN directly implements the entire object detection process with deep neural networks, truly achieving end-to-end training, with a substantial speed increase. The specific process is shown in the figure below.

Based on the analysis of the data collection effects from the bauxite sorting task, the advantages of selectable networks were studied, leading to the choice of ResNet-50 + FPN as the backbone network, which is the convolutional layer in the Faster R-CNN network structure, used for feature extraction. The subsequent network structure and data processing plan remain unchanged. This scheme is one of the versions of the Faster R-CNN model developed to date, and the reasons for the choice are as follows:

ResNet, or residual neural network, is a convolutional neural network proposed by Microsoft Research. It won first place in many classification and detection projects in the 2015 ImageNet competition and COCO competition. Its advantages include ease of optimization and the ability to continuously improve accuracy as the number of convolutional network layers increases. Currently, residual neural networks are divided into ResNet-50, ResNet-101, ResNeXt-101, etc. It is well-known that as the number of network layers increases, semantic information becomes richer, but positional information deteriorates. This paper considers that the detection task of bauxite sorting only requires distinguishing four types of raw stones, with fewer categories and relatively fewer features, thus requiring limited semantic information. Therefore, the choice of the less complex ResNet-50 saves training time and improves detection accuracy.

FPN, or Feature Pyramid Network, is specifically designed to address the shortcomings of object detection when dealing with multi-scale variations. This network structure makes predictions on each level of feature maps based on a feature pyramid, allowing for the fusion of low-resolution but semantically strong feature maps and high-resolution but semantically weak feature maps with rich spatial information, all while adding minimal computational overhead. Given that the ore samples are relatively small, with a data collection distance of 1.5 meters and a resolution of 640×480, the collected ore images are all classified as small to medium-sized targets. The FPN effectively addresses the detection problem of multi-scale targets, significantly enhancing the model’s performance in detecting small targets.

2. Deep Learning Environment Setup

Based on the scale of the training dataset and the requirements of the detection platforms Detectron2 and Darknet for training object detection networks, the hardware environment configuration for this experiment is as follows:

1）Processor: AMD Ryzen 5 3600X 6-Core Processor

2）GPU Model: GeForce RTX 2060

3）Memory: 16GB

4）SSD: 240GB

5）HDD: 1TB

Under such a hardware environment, the specific steps for configuring the deep learning environment are as follows:

（1）Configure the operating system.

The installed system is Ubuntu 20.10 under the Linux system. Download the Ubuntu 20.10 ISO file from the official Ubuntu website and use the Refus software to create a USB boot disk for system installation. The choice of Linux is due to the abundance of related resources and the convenience of environment configuration. There are many distributions of Linux, and Ubuntu is one of the most favored versions among Linux users due to its rich community resources.

（2）Change to domestic sources.

By default, Ubuntu installs and updates software from foreign websites, which can easily lead to download failures or incorrect versions, resulting in various issues. Therefore, the first step after installing the system is to change the installation download source to a domestic address.

The specific method is as follows:

1）Use the lsb_release -a command to query the code name of your Ubuntu version.

2）Log into Aliyun source (other sources can also be used, but this article uses Aliyun source) to check whether the source for the version code name exists. If it exists, proceed to the next step; if not, find another domestic source.

3）Change the contents of the computer’s sources.list configuration file to your version’s Aliyun URL, and finally update the cache and upgrade.

（3）GPU driver configuration.

This part mainly involves configuring and installing the NVIDIA graphics card driver, CUDA, and CUDNN. Since training networks require extensive use of the computer’s hardware facilities, the NVIDIA graphics card driver is used to drive the graphics card, utilizing the NVIDIA GPU; CUDA is the GPU parallel computing framework launched by NVIDIA, which can only run on NVIDIA GPUs and enables the GPU to solve complex computational problems; CUDNN is NVIDIA’s GPU accelerator for deep neural networks. These three components work together to efficiently utilize the device for deep neural network-related operations. When configuring, it is essential to understand the corresponding relationships among these three; generally, a computer’s GPU can support four or five versions of the graphics card driver, and each version corresponds to several compatible CUDA versions, while each CUDA version corresponds to multiple compatible CUDNN versions, which must be checked on NVIDIA’s official site. Generally, it suffices to install the latest NVIDIA graphics card driver, CUDA, and CUDNN, but if the framework or platform you wish to use does not support the latest versions, selective installation is necessary. The driver version installed in this article is 455.38, the CUDA version is 11.1, and the CUDNN version is 8.0.5.

WeChat public account QR code

WeChat public account: Artificial Intelligence Perception Information Processing Algorithm Research Institute

Zhihu homepage: https://www.zhihu.com/people/zhuimeng2080

Leave a Comment Cancel reply