Convolutional Neural Networks (CNN) are a type of feedforward neural network that includes convolutional computations and has a deep structure. They are one of the representative algorithms of deep learning. This article aims to introduce the basic concepts and structures of CNN, as well as the fundamental ideas behind CNN architecture design.
This article is packed with valuable content, suitable for saving!
【1】Introduction
No more unnecessary talk, let’s get started.
When we get an image and need to recognize it, the simplest example is to ask: what is this image?
For instance, I want to train the simplest CNN to recognize whether a letter in an image is X or O.
When we look at it, it seems very simple; obviously, it is X. However, the computer does not know what X is. So we label this image, commonly known as a Label, with Label=X, which tells the computer that this image represents X. It remembers what X looks like.
But not all X’s look like this. For example…
data:image/s3,"s3://crabby-images/3af3e/3af3e735ea50235288bd13f046ca44b746e11915" alt="Understanding Convolutional Neural Networks in Machine Learning"
These four are all X, but they look obviously different from the previous X. The computer has never seen them and does not recognize them.
(Here we can mention a term that sounds very sophisticated in machine learning: “underfitting”)
What to do if it doesn’t recognize them? Of course, it should recall if it has seen something similar. At this point, what CNN needs to do is to extract the features of images that represent X.
We all know that images are stored in the computer in terms of pixel values, meaning that two X’s actually look like this to the computer.
data:image/s3,"s3://crabby-images/51ee7/51ee7dcb92660bc554ffe5f6b6f6db4bea794ae2" alt="Understanding Convolutional Neural Networks in Machine Learning"
data:image/s3,"s3://crabby-images/ce8d6/ce8d631d40ff7f3e29b76953985683d4642e09da" alt="Understanding Convolutional Neural Networks in Machine Learning"
Where 1 represents white, and -1 represents black.
If we compare each pixel one by one, it is definitely unscientific, the results will be wrong and the efficiency will be low, so other matching methods are proposed.
We call this patch matching.
Observing these two X images, we can see that despite the pixel values not corresponding one by one, there are still some common points.
As shown in the above image, the structure of the three same-colored areas in both images is completely consistent!
Therefore, we consider whether we can relate these two images. While we cannot correspond all pixels, can we match them locally?
The answer is certainly yes.
If I want to locate a face in a photo, but CNN does not know what a face is, I tell it: a face has three features, what the eyes, nose, and mouth look like, and then tell it what these look like. As long as CNN searches the entire image and finds where these three features are located, it has located the face.
Similarly, we extract three features from the standard X image.
data:image/s3,"s3://crabby-images/1c338/1c33887b1354ae6d19abfcac2a571ba729196aff" alt="Understanding Convolutional Neural Networks in Machine Learning"
data:image/s3,"s3://crabby-images/6feaf/6feafb40e1b1a92a5c5d9a8a815ff40967fb959d" alt="Understanding Convolutional Neural Networks in Machine Learning"
data:image/s3,"s3://crabby-images/b92c0/b92c0786c9e7053bc0d0db2d297e8958beb40977" alt="Understanding Convolutional Neural Networks in Machine Learning"
data:image/s3,"s3://crabby-images/e4501/e4501abd866cc791910247571a59cc44056e7182" alt="Understanding Convolutional Neural Networks in Machine Learning"
We find that using these three features can locate a certain part of X.
data:image/s3,"s3://crabby-images/bb4c6/bb4c646b0e7942c5bcdc87edec9f459c5a436ddb" alt="Understanding Convolutional Neural Networks in Machine Learning"
Features in CNN are also called convolution kernels (filters), usually sized 3X3 or 5X5.
【2】Convolution Operation
After talking for so long, we finally get to the word convolution!
But!! Friends! The convolution operation in convolutional neural networks has nothing to do with the convolution operation in signal processing! At one point, I even reviewed the convolution operation from higher mathematics! Ugh!
These!! have nothing to do with our CNN!!!
Alright, let’s continue discussing how to calculate. Four words: corresponding multiplication. Look at the diagram.
Take the (1,1) element value from the feature, and take the (1,1) element value from the blue box in the image, multiplying the two gives 1. Fill this result of 1 into the new image.
data:image/s3,"s3://crabby-images/b889a/b889a80481e4e27c4d303ba5c632d406c175a768" alt="Understanding Convolutional Neural Networks in Machine Learning"
data:image/s3,"s3://crabby-images/97475/97475332b45d8c3d02d066da9e5d5acc1f77d018" alt="Understanding Convolutional Neural Networks in Machine Learning"
Similarly, continue calculating the values at the other 8 coordinates
data:image/s3,"s3://crabby-images/2a4c8/2a4c88ed69657cc93bcc29a95b9c18a4040977ff" alt="Understanding Convolutional Neural Networks in Machine Learning"
Once all 9 are calculated, it will look like this.
data:image/s3,"s3://crabby-images/b03c6/b03c65fc62b2b1357dc4c6ae43422bc2270be4f7" alt="Understanding Convolutional Neural Networks in Machine Learning"
The next task is to average the nine values on the right, get a mean, and fill the mean into a new image.
This new image is called the feature map.
data:image/s3,"s3://crabby-images/a976d/a976d964cec30df57b132ce17bf0d1125de6e3c4" alt="Understanding Convolutional Neural Networks in Machine Learning"
Some children might want to ask, why does the blue box have to be placed in this position on the image?
This is just an example. This blue box is called a “window,” and its characteristic is that it can slide.
Actually, it should start at the initial position.
data:image/s3,"s3://crabby-images/fa233/fa2335d967104fd986df9658446b96682f615063" alt="Understanding Convolutional Neural Networks in Machine Learning"
After performing the convolution corresponding multiplication operation and calculating the mean, the sliding window starts to slide to the right. Depending on the stride, the sliding amplitude is selected.
For example, if the stride is 1, it moves right by one pixel.
If the stride is 2, it moves right by two pixels.
After moving to the far right, it returns to the left and starts the second row. Similarly, if the stride is 1, it moves down by one pixel; if the stride is 2, it moves down by two pixels.
Alright, after a series of convolution corresponding multiplication and averaging operations, we finally filled a complete feature map.
The feature map is the “feature” extracted from the original image by each feature. The closer the value is to 1, the more complete the match with the corresponding position and feature; the closer it is to -1, the more complete the match with the opposite of the feature, while values close to 0 indicate no match or little correlation.
One feature produces a feature map for the image, and for this X image, we used three features, so we ultimately produced three feature maps.
data:image/s3,"s3://crabby-images/aef61/aef6199e699abfffaf3fa9ff569c4bc88d7fb6e8" alt="Understanding Convolutional Neural Networks in Machine Learning"
Thus, the convolution operation part is complete!~
Finally, I have compiled a resource package on artificial intelligence learning, scan the code to obtain~
Huaqing Yuanshen
Helping many students realize their IT dreams
Achieve high salary dreams
Scan for surprises↑↑↑
Don’t forget to share good things~
data:image/s3,"s3://crabby-images/7c7ec/7c7ec1b7ddf80efce20c9491224f4a36f4fc28da" alt="Understanding Convolutional Neural Networks in Machine Learning"
data:image/s3,"s3://crabby-images/608bb/608bb607574de1c1bf7db238e0caa37d4e54e022" alt="Understanding Convolutional Neural Networks in Machine Learning"