Comprehensive Explanation of CNN Convolutional Neural Networks

Follow “Deep Learning Natural Language Processing” to learn and grow together!

Bookmark us to get more valuable content first-hand.

Author: Yun Bu Jian
Link: https://blog.csdn.net/Walk_OnTheRoad/article/details/108048101
Editor: Wang Meng, City University of Macau (Deep Learning Community)
This note is based on the key knowledge summary from the book “Introduction to Deep Learning”..
Continuing from the notes on “Introduction to Deep Learning: Theory and Implementation Based on Python” (2).Let’s continue documenting the learning notes.
Table of Contents:
1. Overview of CNN Framework
2. Convolution Layer
2.1 Convolution Operation
2.2 Padding
2.3 Stride
2.4 Convolution Operation for 3D Data
2.5 Batch Processing
3. Pooling Layer
4. Implementation of Conv and Pooling Layers
4.1 4D Arrays
4.2 Expansion Based on im2col
4.3 Implementation of Convolution Layer
4.4 Implementation of Pooling Layer
5. CNN Implementation
6. CNN Visualization
Key Knowledge Points of This Article:
  • Overview of CNN Framework
  • Convolution Layer
  • Pooling Layer
  • Implementation of Convolution and Pooling Layers
  • CNN Implementation
  • Introduction to CNN Visualization
If there are details not covered, please continue to read “Introduction to Deep Learning: Theory and Implementation Based on Python”; it is indeed a very good introductory book for deep learning for beginners, easy to understand. (The examples in the book are mainly based on CV.)
Comprehensive Explanation of CNN Convolutional Neural Networks
1. Overview of CNN Framework
Neural networks are essentially a process of assembling layers.
CNN introduces new layers: Convolution Layer and Pooling Layer.
Q: How to assemble a CNN?
Fully connected layer: implemented using Affine: Affine-ReLU (Affine transformation y = xw+b), as shown in the 5-layer neural network based on the fully connected layer.
Comprehensive Explanation of CNN Convolutional Neural Networks
ReLU can also be replaced with Sigmoid layer. Here it consists of 4 layers of Affine-ReLU, finally outputting the result (probability) via Affine-Softmax.
Common CNN structure: Affine-ReLU becomes Conv-ReLU-(Pooling), as shown in the 5-layer neural network based on CNN.
Comprehensive Explanation of CNN Convolutional Neural Networks
Q: What problems does the fully connected layer have? Why improve to Conv layer?
  • The fully connected layer “ignores” the shape of the data, flattening 3D data into 1D; the shape contains important spatial information: ① Spatially adjacent pixels have similar values, while pixels far apart are unrelated; ② The RGB channels have close correlations; ③ The 3D shape may hide essential patterns worth extracting.

  • The convolution layer can maintain the shape. It can correctly interpret data such as images that have shapes.

Feature map: input and output data
2. Convolution Layer
Comprehensive Explanation of CNN Convolutional Neural Networks
2.1 Convolution Operation

The input feature map is multiplied and accumulated with the convolution kernel, sliding the window with a certain stride to obtain the output feature map, and a bias can also be added (1*1).

Comprehensive Explanation of CNN Convolutional Neural Networks

The convolution kernel (filter) is equivalent to the weights in the fully connected layer.
After convolution, the bias is applied to all data.
Comprehensive Explanation of CNN Convolutional Neural Networks
Comprehensive Explanation of CNN Convolutional Neural Networks
2.2 Padding
Padding fixed data (like 0) around the input data.
The purpose of padding: adjust the output size. Expand the input feature map to get a larger output. Padding is generally set to 0.
Why adjust the output size?
Because for example, input (4×4), convolution kernel (3×3), the output would be (2×2). As the layers deepen, the output from convolution gets smaller and smaller until it becomes 1, after which convolution cannot continue. To avoid this situation, padding is needed to ensure that the output does not decrease.
Comprehensive Explanation of CNN Convolutional Neural Networks
Comprehensive Explanation of CNN Convolutional Neural Networks
2.3 Stride
Purpose: specify the interval of the filter.
Increasing the stride reduces the output; increasing padding increases the output.
Calculating the output size:
Comprehensive Explanation of CNN Convolutional Neural Networks
Where:
  • (OH,OW) output size

  • (H,W) input size

  • (FH,FW) filter size

  • P padding

  • S stride

Note: In equation (7.1), it is best if it divides evenly. If it cannot be divided evenly, an error should be reported. Alternatively, round to the nearest integer without reporting an error and continue running.
Comprehensive Explanation of CNN Convolutional Neural Networks
2.4 Convolution Operation for 3D Data
(Channel, Height, Width): adds one more dimension than 2D data.
(C,H,W)
(C,H,W)
Comprehensive Explanation of CNN Convolutional Neural Networks
Block thinking, the two images above represent the same concept.
Comprehensive Explanation of CNN Convolutional Neural Networks
The number of channels in the input data and the filter must match.
Each channel is convolved separately, and the values of each channel are summed to obtain the final output value.
How can we obtain an output with multiple channels?
  • By applying multiple filters (weights), e.g., FN filters. As shown in the figure below.

Comprehensive Explanation of CNN Convolutional Neural Networks
The filters become 4D, with one filter corresponding to one output feature map. A bias can also be added (FN,1,1).
As 4D data, the filters are written in the order of (output_channel, input_channel, height, width).
For example, if there are 20 filters with a size of 5 × 5 and a channel count of 3, it can be written as (20, 3, 5, 5).
When adding blocks of different shapes, it can be easily implemented based on NumPy’s broadcasting feature (section 1.5.5).
Comprehensive Explanation of CNN Convolutional Neural Networks
2.5 Batch Processing
  • Purpose: to achieve efficient data processing, packaging N data together for processing. This combines N processes into one.

    3D -> 4D, i.e., (C,H,W) -> (N,C,H,W)

3. Pooling Layer
  • Purpose: to reduce the spatial computation in the H and W directions. For example, compressing a 2 × 2 area into one element for processing, reducing spatial size.

Comprehensive Explanation of CNN Convolutional Neural Networks
Max pooling: operation to get the maximum value.
Common setup: the pooling window size and stride are set to the same value. For example, both are 2.
In image recognition, Max pooling is primarily used.
Features of the pooling layer:
  1. No parameters to learn;

  2. The number of channels does not change; if 3 inputs are given, the output will also be 3;

  3. Robustness to small positional changes (robust), i.e., when the input data undergoes slight deviations, pooling can still return the same result; pooling can absorb deviations in the input data.

4. Implementation of Conv and Pooling Layers
Comprehensive Explanation of CNN Convolutional Neural Networks
4.1 4D Arrays

For example: randomly generate a 4D data (10, 1, 28, 28) with 10 channels, each of height and length 28.

>>> x = np.random.rand(10, 1, 28, 28) # Randomly generate 10 channels of height and length 28 data
>>> x.shape(10, 1, 28, 28)
>>> x[0].shape # (1, 28, 28)   x[0]: first data (first block)
>>> x[1].shape # (1, 28, 28)   x[1]: second data
>>> x[0, 0] # or x[0][0]  First data's first channel
>>> x[1, 0] # or x[1][0]  Second data's first channel
Comprehensive Explanation of CNN Convolutional Neural Networks
4.2 Expansion Based on im2col
  • Problem: Implementing convolution requires repeating several for loops, which is cumbersome, and accessing elements with NumPy is best not done using for (slow).

  • Solution: Use the im2col function (image to column): expand the input data to fit the filter (weights).

Convert 4D data to 2D data.

(N,C,H,W), i.e., (batch size, channel count, height, width).

Comprehensive Explanation of CNN Convolutional Neural Networks
Using im2col consumes more memory but allows for more efficient matrix operations, effectively utilizing linear algebra libraries.
Comprehensive Explanation of CNN Convolutional Neural Networks
Comprehensive Explanation of CNN Convolutional Neural Networks
Comprehensive Explanation of CNN Convolutional Neural Networks
4.3 Implementation of Convolution Layer

The im2col function: converts 4D input data to 2D.

x input 4D -> 2D matrix.

W filter 4D -> 2D matrix.

Matrix product X*W + b -> output 2D —(reshape)—> output 4D.

im2col (input_data, filter_h, filter_w, stride=1, pad=0)

Code implementation for convolution layer forward pass

import sys, os
sys.path.append(os.pardir)
from common.util import im2col
x1 = np.random.rand(1, 3, 7, 7) # Batch size of 1, channel of 3 for 7 × 7 data
col1 = im2col(x1, 5, 5, stride=1, pad=0)
print(col1.shape) # (9, 75)
x2 = np.random.rand(10, 3, 7, 7) # Batch size of 10, channel of 3 for 7 × 7 data
col2 = im2col(x2, 5, 5, stride=1, pad=0)
print(col2.shape) # (90, 75)
# Convolution layer class
class Convolution:
    def __init__(self, W, b, stride=1, pad=0):
        self.W = W
        self.b = b
        self.stride = stride
        self.pad = pad
    # Forward pass
    def forward(self, x):
        FN, C, FH, FW = self.W.shape
        N, C, H, W = x.shape
        out_h = int(1 + (H + 2*self.pad - FH) / self.stride)
        out_w = int(1 + (W + 2*self.pad - FW) / self.stride)
        col = im2col(x, FH, FW, self.stride, self.pad)
        col_W = self.W.reshape(FN, -1).T # Flatten the filter
        out = np.dot(col, col_W) + self.b
        out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)
        return out
    # Backward pass in common/layer.py must perform the inverse of im2col -> col2im (matrix to image)

Comprehensive Explanation of CNN Convolutional Neural Networks

Comprehensive Explanation of CNN Convolutional Neural Networks
4.4 Implementation of Pooling Layer

Using the im2col function.

Independently expand along the channel direction, while pooling adds the values of each channel together.

Comprehensive Explanation of CNN Convolutional Neural Networks

Comprehensive Explanation of CNN Convolutional Neural Networks

Code implementation for pooling layer forward pass

class Pooling:
    def __init__(self, pool_h, pool_w, stride=1, pad=0):
        self.pool_h = pool_h
        self.pool_w = pool_w
        self.stride = stride
        self.pad = pad
    def forward(self, x):
        N, C, H, W = x.shape
        # Calculate output size
        out_h = int(1 + (H - self.pool_h) / self.stride)
        out_w = int(1 + (W - self.pool_w) / self.stride)
        # Expand (1)    1. Expand input data
        col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
        col = col.reshape(-1, self.pool_h*self.pool_w)
        # Max value (2)  2. Get max value of each row
        out = np.max(col, axis=1)
        # Convert (3)    3. Convert to suitable output size
        out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)
        return out

The implementation of the pooling layer proceeds in three stages:

  1. Expand input data.

  2. Get the maximum value of each row.

  3. Convert to suitable output size.

5. CNN Implementation
class SimpleConvNet:
    """Simple ConvNet
    conv - relu - pool - affine - relu - affine - softmax
        Parameters
    ----------
    input_size : Input size (784 for MNIST, three-dimensional (1, 28, 28))
    hidden_size_list : List of the number of neurons in hidden layers (e.g., [100, 100, 100])
    output_size : Output size (10 for MNIST, ten possible outputs)
    activation : 'relu' or 'sigmoid'
    weight_init_std : Specify the standard deviation of weights (e.g., 0.01)
        When specified as 'relu' or 'he', set 'He initialization'
        When specified as 'sigmoid' or 'xavier', set 'Xavier initialization'
    """
    def __init__(self, input_dim=(1, 28, 28),
                  conv_param={'filter_num':30, 'filter_size':5, 'pad':0, 'stride':1},
                 hidden_size=100, output_size=10, weight_init_std=0.01): # 30 filters of size 5*5
        # Extract filter parameters from conv_param dictionary for later use
        filter_num = conv_param['filter_num']
        filter_size = conv_param['filter_size']
        filter_pad = conv_param['pad']
        filter_stride = conv_param['stride']
        input_size = input_dim[1]  # 28
        conv_output_size = (input_size - filter_size + 2*filter_pad) / filter_stride + 1
        pool_output_size = int(filter_num * (conv_output_size/2) * (conv_output_size/2)) # Pooling layer output, H, W halved
        # Initialize weights for three layers
        self.params = {}
        # The filter is the weight W1 = (30,1,5,5) (4D data), i.e., 30 filters of height and width 5, channel count 1.
        self.params['W1'] = weight_init_std * \
                            np.random.randn(filter_num, input_dim[0], filter_size, filter_size)
        # Each filter has a bias b1, with 30 biases
        self.params['b1'] = np.zeros(filter_num)
        self.params['W2'] = weight_init_std * \
                            np.random.randn(pool_output_size, hidden_size)
        self.params['b2'] = np.zeros(hidden_size)
        self.params['W3'] = weight_init_std * \
                            np.random.randn(hidden_size, output_size)
        self.params['b3'] = np.zeros(output_size)
        # Create layers for calling
        self.layers = OrderedDict() # Ordered dictionary
        self.layers['Conv1'] = Convolution(self.params['W1'], self.params['b1'],
                                           conv_param['stride'], conv_param['pad'])
        self.layers['Relu1'] = Relu()
        self.layers['Pool1'] = Pooling(pool_h=2, pool_w=2, stride=2)
        self.layers['Affine1'] = Affine(self.params['W2'], self.params['b2'])
        self.layers['Relu2'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W3'], self.params['b3'])
        self.last_layer = SoftmaxWithLoss()
    # Forward pass, sequentially calling layers and passing results to the next layer
    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        return x
    # In addition to using predict, also perform forward until reaching the last SoftmaxWithLoss layer
    def loss(self, x, t):
        """Calculate loss function
        Parameters
        ----------
        x : Input data
        t : Teacher labels
        """
        y = self.predict(x)
        return self.last_layer.forward(y, t)
    def accuracy(self, x, t, batch_size=100):
        if t.ndim != 1 : t = np.argmax(t, axis=1)
        acc = 0.0
        for i in range(int(x.shape[0] / batch_size)):
            tx = x[i*batch_size:(i+1)*batch_size]
            tt = t[i*batch_size:(i+1)*batch_size]
            y = self.predict(tx)
            y = np.argmax(y, axis=1)
            acc += np.sum(y == tt)
        return acc / x.shape[0]
    def numerical_gradient(self, x, t):
        """Calculate gradient (numerical differentiation)
        Parameters
        ----------
        x : Input data
        t : Teacher labels
        Returns
        -------
        A dictionary variable containing gradients for each layer
            grads['W1'], grads['W2'], ... are the weights for each layer
            grads['b1'], grads['b2'], ... are the biases for each layer
        """
        loss_w = lambda w: self.loss(x, t)
        grads = {}
        for idx in (1, 2, 3):
            grads['W' + str(idx)] = numerical_gradient(loss_w, self.params['W' + str(idx)])
            grads['b' + str(idx)] = numerical_gradient(loss_w, self.params['b' + str(idx)])
        return grads
    # Call backward for each layer and store gradient parameters in grads dictionary
    def gradient(self, x, t):
        """Calculate gradient (backpropagation) (choose one)
        Parameters
        ----------
        x : Input data
        t : Teacher labels
        Returns
        -------
        A dictionary variable containing gradients for each layer
            grads['W1'], grads['W2'], ... are the weights for each layer
            grads['b1'], grads['b2'], ... are the biases for each layer
        """
        # forward
        self.loss(x, t)
        # backward
        dout = 1
        dout = self.last_layer.backward(dout)
        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)
        # Set
        grads = {}
        grads['W1'], grads['b1'] = self.layers['Conv1'].dW, self.layers['Conv1'].db
        grads['W2'], grads['b2'] = self.layers['Affine1'].dW, self.layers['Affine1'].db
        grads['W3'], grads['b3'] = self.layers['Affine2'].dW, self.layers['Affine2'].db
6. CNN Visualization

The filters will extract raw information such as edges or patches. As the layers deepen, the extracted information becomes increasingly complex.

Comprehensive Explanation of CNN Convolutional Neural Networks

  • Extracted information becomes more complex:

    Edges -> Textures -> Object Parts -> Classification

In 1998, the ancestor of CNN: LeNet -> for example, handwritten digit recognition.

In 2012: Deep Learning AlexNet.

They both stack multiple convolution and pooling layers, finally outputting through fully connected layers.

LeNet:

  • Activation function uses sigmoid (currently mainly using ReLU).

  • Data is reduced through subsampling (currently mainly using Max pooling).

AlexNet:

  • Activation function uses ReLU.

  • Uses LRN layer (local response normalization) for local regularization.

  • Uses dropout.

Regarding network structure, LeNet and AlexNet do not differ much; nowadays, the rapid development of big data and GPUs has propelled the advancement of deep learning.
Note:If there are details not covered, please continue to read “Introduction to Deep Learning”; it is indeed a very easy-to-understand introductory book for beginners in deep learning.(The examples in the book are mainly based on CV.)
说个正事哈



由于微信平台算法改版,公号内容将不再以时间排序展示,如果大家想第一时间看到我们的推送,强烈建议星标我们和给我们多点点【在看】。星标具体步骤为:
(1)点击页面最上方“深度学习自然语言处理”,进入公众号主页。
(2)点击右上角的小点点,在弹出页面点击“设为星标”,就可以啦。

感谢支持,比心。


投稿或交流学习,备注:昵称-学校(公司)-方向,进入DL&NLP交流群。


方向有很多:机器学习、深度学习,python,情感分析、意见挖掘、句法分析、机器翻译、人机对话、知识图谱、语音识别等。


记得备注呦


推荐两个专辑给大家:
专辑 | 李宏毅人类语言处理2020笔记
专辑 | NLP论文解读


Leave a Comment