This article will cover the essence of ResNetthe principles of ResNetand the applications of ResNet to help you understand Residual Neural Networks (ResNet).

Residual Neural Network ResNet
1. The essence of ResNetResNet’s definition:
Degradation Problem:As the number of layers in the network increases, the performance of the network does not continuously improve, but rather saturates and even declines after reaching a certain depth. This phenomenon contradicts intuition, as generally, more layers imply a stronger learning capacity for the network, enabling it to learn more complex function mappings.



2. Principles of ResNet:
ResNet architecture: ResNet34 uses a 34-layer ordinary network architecture inspired by VGG-19, followed by the addition of shortcut connections. This architecture is then transformed into a residual network through these shortcut connections, as shown below:

Residual Block::In the ResNet architecture, multiple residual learning blocks are typically concatenated together.
Residual Block
In different versions of ResNet, such as ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, etc., the number of residual blocks is determined by the total number of layers in the network.

Taking ResNet34 as an example, it contains 34 layers, including a 7×7 convolutional layer (counted as one layer), followed by 16 residual blocks (each containing two convolutional layers), and finally connected to a fully connected layer (the last layer), making a total of 34 layers.

Architecture of ResNet34
How ResNet works: ResNet mitigates the vanishing gradient problem through residual blocks and skip connections, utilizing downsampling and global pooling to extract key features, classifying or regressing through fully connected layers, and accelerating training and generalization with batch normalization and ReLU.

The workflow of ResNet34 can be briefly summarized as follows:
-
Input Layer: First, image data is fed into the first convolutional layer of ResNet34, which is typically a 7×7 convolutional layer used for initial feature extraction.
-
Residual Block: Subsequently, the image data passes through a series of residual blocks. Each residual block contains two convolutional layers for further feature extraction. Importantly, each residual block achieves this through skip connections, meaning the input of each residual block is directly added to its output.
-
Downsampling: Between residual blocks, downsampling operations may be implemented by altering the stride of the convolution or using additional convolutional layers to reduce the size of the feature maps while preserving spatial hierarchies.
-
Fully Connected Layer: After all residual blocks, the global average pooling layer reduces the spatial dimensions of the feature maps to 1×1, generating a feature vector. This feature vector is then passed to one or more fully connected layers (also known as dense layers or linear layers) for classification or regression tasks.
-
Output: Finally, ResNet34 outputs prediction results. For classification tasks, the output is usually a probability distribution vector indicating the probabilities that the image belongs to various categories; for regression tasks, the output may be specific numerical values.
Throughout the workflow, batch normalization and the ReLU activation function are typically used after each convolutional layer and fully connected layer to accelerate training and improve the model’s generalization ability.

Workflow of ResNet34
3. Applications of ResNet:


The following are three main applications of ResNet in computer vision:

Object Detection:: In object detection tasks, ResNet can effectively extract features of objects in images and accurately detect and locate multiple objects in the image by combining methods such as Region Proposal Networks (RPN).

Image Segmentation:: For image segmentation tasks, ResNet, through its powerful feature extraction capabilities, can achieve fine segmentation of different areas in images, providing important support for applications in medical image analysis, autonomous driving, and more.
