In-Depth Explanation of Convolutional Neural Networks

Selected from Medium

Author: Harsh Pokharna

Translated by: Machine Heart

Contributors: Duxiade

This is one of the articles in the author’s series on neural networks introduced on Medium, where he provides a detailed explanation of convolutional neural networks. Convolutional neural networks have wide applications in image recognition, video recognition, recommendation systems, and natural language processing. If you want to browse this series of articles, you can click to read the original text at the original URL.

Like neural networks, convolutional neural networks are composed of neurons, which have learnable weights and biases. Each neuron receives several inputs, performs a weighted sum of the inputs, passes them through an activation function, and then outputs a response. The entire neural network has a loss function, and all the tips and tricks we develop for neural networks still apply to convolutional neural networks.

So, how do convolutional neural networks differ from neural networks?

Convolutional Neural Networks Run Excessively

What does this mean?

1. An example of an RGB image (let’s call it ‘input image’)

In a neural network, the input is a vector, but in a convolutional neural network, the input is a multi-channel image (the image in this example has 3 channels).

Convolution

2. Convoluting the image with a filter

We used a 5×3×5 filter sliding over the entire image, collecting the dot product between the filter and blocks of the input image as it slides.

3. This is what it looks like

For each collected dot product, the result is a scalar.

So what happens when we convolve a complete image with this filter?

4. This is it!

You can think about how this ’28’ came about. (Hint: there are 28×28 unique positions where the filter can be placed on this image)

Now, back to convolutional neural networks

The convolutional layer is the main building block of convolutional neural networks.

In-Depth Explanation of Convolutional Neural Networks 5. Convolutional Layer

The convolutional layer consists of a set of independent filters (in this example, there are 6). Each filter convolves independently with the image, resulting in 6 feature maps shaped 28×1×28.

Assuming we have a sequential convolutional layer. What happens then?

6. Sequential Convolutional Layer

All these filters are randomly initialized and become our parameters, which will be learned by this network.

Here is an example of a trained network:

7. Filters in a trained network

Look at the topmost filters (these are our 5×3×5 filters). Through backpropagation, they adjust themselves to color patches and edges. As we delve into other convolutional layers, these filters perform dot products on the inputs from previous convolutional layers. Thus, they are capturing these smaller color patches or edges to create larger color patches.

Look at Figure 4 and imagine this 28×1×28 grid as 28×28 neurons. For a specific feature map (the output received by the image with a filter in convolution is called a feature map), each neuron connects only to a small part of the input image, and all neurons share the same connection weights. Therefore, again, returning to the differences between convolutional networks and neural networks.

Two Concepts of Convolutional Neural Networks: Parameter Sharing and Local Connectivity

Parameter sharing means sharing weights across all neurons in a specific feature map.

The concept of local connectivity means that each neuron connects only to a subset of the input image (unlike all neurons in a neural network being fully connected).

This helps reduce the number of parameters in the entire system, making computations more efficient.

For simplicity, we will not discuss the concept of zero padding here. Those interested can read related materials themselves.

Pooling Layer

A pooling layer is another building block of convolutional neural networks

Pooling

Its function is to reduce the parameter count and computation in the network by gradually decreasing the spatial dimensions of the representation. The pooling layer operates independently on each feature map.

Max Pooling

A Typical Architecture of a Convolutional Neural Network

Typical Architecture of a Convolutional Neural Network

We have discussed convolutional layers (denoted as CONY) and pooling layers (denoted as POOL)

RELU is just a nonlinear feature applied, similar to neural networks.

This FC is fully connected to the neuron layer at the end of the convolutional neural network. The neurons in the fully connected layer connect to all activations from the previous layers, as seen in conventional neural networks, and operate in a similar way.

I hope you now understand this architecture of a convolutional neural network. There are many variations of this architecture, but as mentioned earlier, the basic concepts remain the same.

© This article is translated by Machine Heart, please contact this public account for authorization to reprint.

✄————————————————

Join Machine Heart (Full-time Reporter/Intern): [email protected]

Submissions or Seeking Coverage: [email protected]

Advertising & Business Cooperation: [email protected]

Leave a Comment Cancel reply