Evolution of CNN Architecture: From AlexNet to ResNet
Hello everyone, I am Sister Liu. Today we will delve into the evolution of Convolutional Neural Networks (CNN), which is one of the most important technological developments in the field of computer vision.
Background Knowledge
Before the rise of deep learning, traditional image recognition methods relied on manually designed feature extraction techniques. These methods often had significant limitations and struggled with complex visual tasks. The emergence of Convolutional Neural Networks (CNN) has completely changed this situation.
Basic Concepts
-
Convolutional Layer: Extracts local features by sliding a convolution kernel over the input data.
-
Pooling Layer: Reduces feature dimensions and decreases computational complexity.
-
Fully Connected Layer: Maps the extracted features to the final classification results.
Technical Evolution
AlexNet (2012)
AlexNet is a milestone in deep learning image recognition. It first demonstrated the immense potential of deep convolutional neural networks in large-scale image recognition tasks.
Key Innovations
-
Utilized the ReLU activation function to solve the vanishing gradient problem.
-
Adopted Dropout regularization method.
-
Employed data augmentation techniques.
-
Leveraged GPUs for parallel computing.
VGGNet (2014)
VGGNet demonstrated the impact of network depth on performance by stacking more convolutional layers.
Main Features
-
Used 3×3 small convolution kernels.
-
Network depth reached 16-19 layers.
-
Simplified network design philosophy.
ResNet (2015)
ResNet introduced the concept of shortcut connections, addressing the degradation problem of deep networks.
Core Breakthroughs
-
Design of residual blocks.
-
Extremely deep network structure (152 layers).
-
Better propagation of gradient information.
Implementation Method: PyTorch Implementation of ResNet Residual Block
import torch
import torch.nn as nn
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, in_channels, out_channels, stride=1):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
self.shortcut = nn.Sequential()
if stride != 1 or in_channels != self.expansion * out_channels:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, self.expansion * out_channels,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(self.expansion * out_channels)
)
def forward(self, x):
out = self.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += self.shortcut(x)
out = self.relu(out)
return out
Performance Analysis
Comparative Analysis
Network Model | Top-1 Accuracy | Parameter Count | Computational Complexity |
---|---|---|---|
AlexNet | 57.1% | 60M | Medium |
VGGNet | 71.3% | 138M | High |
ResNet-50 | 76.2% | 25M | Low |
Limitations and Improvement Directions
-
Computational resource demands.
-
Risk of overfitting.
-
Transfer learning capabilities.
Extended Application Case: Object Detection
class ResNetBackbone(nn.Module):
def __init__(self, block, num_blocks, num_classes=1000):
super(ResNetBackbone, self).__init__()
self.in_channels = 64
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, out_channels, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1)
layers = []
for stride in strides:
layers.append(block(self.in_channels, out_channels, stride))
self.in_channels = out_channels * block.expansion
return nn.Sequential(*layers)
def forward(self, x):
x = self.maxpool(self.relu(self.bn1(self.conv1(x))))
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
Conclusion
The development of CNN demonstrates the revolutionary progress of deep learning in the field of computer vision. From AlexNet to ResNet, each generation of networks continues to break performance limits, making significant contributions to the development of artificial intelligence.
Sister Liu hopes everyone can deeply understand the design philosophies of these networks, not only to use them but also to comprehend the underlying principles. Continuous learning and innovation are the keys to success in the field of research!