Editor’s Note
In this article, we will implement a simple neural network from scratch using PyTorch. Before reading this article, we assume you already understand how neural networks work.
Editor’s Note
In this article, we will implement a simple neural network from scratch using PyTorch. Before reading this article, we assume you already understand how neural networks work.
Reprinted from丨GuYueJu
Recently, I noticed some beginner friends around me holding various PyTorch guides, typing code while reading, and in the end, they became mere typists.
After typing the code, when they run it and compare the results with the ones in the book, they find that, oh, it matches the book’s result, so they turn to the next chapter.
They can finish typing a whole book in half a day, but when they close the book, it seems they remember nothing. Some even read it two or three times but still can’t build a simple network, which is not a good way to learn.
If you happen to be in this situation, this article should help you. If you are already at an advanced level, feel free to close the page.
Building networks with PyTorch is much simpler than with TensorFlow. The format is easy to understand.
If you want to create a network, you need to define a class that inherits from nn.Module (this is essential, so first import torch.nn as nn; nn is a very useful toolbox). Let’s name the class Net.
class Net(nn.Module):
This class mainly contains two functions: one is the initialization function __init__, and the other is the forward function. We can set it up like this:
def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 6, 5) self.conv2 = nn.Conv2d(6, 16, 5) def forward(self, x): x = F.max_pool2d(F.relu(self.conv1(x)), 2) x = F.max_pool2d(F.relu(self.conv2(x)), 2) return x
Inside __init__, we define the convolutional layers, and of course, we need to call super() to initialize the parent class nn.Module.
(Basic Python knowledge) Here, we mainly define the convolutional layers. For example, for the first layer, we call it conv1, defining it as an input of 1 channel and an output of 6 channels, with a convolution kernel of 5×5. The same applies to conv2.
The essence of neural network “deep learning” is to learn the parameters within the convolutional kernels; parameters that do not need to be learned or changed should not be included.
For instance, the activation function relu() can be included if you want; you can even give it a name like myrelu, which is fine too. The forward function is where the actual data flow occurs.
For example, in the code above, the input x first goes through the defined conv1 (the name is chosen by you), then through the activation function F.relu() (this is not a name you made up; it should first be imported as torch.nn.functional as F; F.relu() is a function provided by the official library).
Of course, if you defined relu as myrelu in __init__, then here the first line would simply be x = F.max_pool2d(myrelu(self.conv1(x)), 2).
The next step of F.max_pool2d pooling is the same, so I won’t elaborate further. After a series of flows, we finally return x to the outside.
When defining the Net class, there are two main points to note.
First: ensure the consistency between the output channels and input channels. For example, if the first convolution layer outputs 4 channels, the second cannot take in 6 channels, or it will throw an error.
Second: it has some differences from our conventional Python classes, did you notice? How do we use this Net?
First, we need to define an instance of Net (after all, Net is just a class and cannot be directly passed parameters; output = Net(input) is not valid).
net = Net()
Now we can pass x into it. Let’s assume you already have some input data for the neural network called “input” (this input should be defined as a tensor type; how to define a tensor is something you can look up in the book). When passing it in, it is:
output = net(input)
Looking at the previous definition:
def __init__(self): …… def forward(self, x): ……
It’s a bit strange. Normally in Python, when passing data into a class, it should be passed as a parameter into the __init__ function within the class definition, but in the above definition, x is passed as a parameter into the forward function.
Actually, it’s not contradictory because when you define net, you do it as net = Net(), and you are not passing any parameters into it. If you want to pass parameters during initialization, you can do so.
It’s just that x is the input to the neural network, but does it necessarily need to have input data for initialization?
Not necessarily. It is only that when passing into the network, it automatically assumes that this x is fed into the forward function. In other words, first, define an instance of the network net = Net(), and then calling output = net(input) can be understood as equivalent to calling output = net.forward(input); these two can be seen as the same thing.
After defining the network, it involves passing parameters, calculating errors, backpropagation, updating weights… Indeed, it can be hard to remember these formats and sequences.
The method of passing in has already been introduced, which is equivalent to one forward pass, calculating the inputs x across all layers.
If you want the output of the neural network to be close to your expected ground truth, you need to continuously minimize the difference between the two, which is defined by you, known as the objective function (object function) or the loss function.
If the loss function approaches 0, then naturally, the goal is achieved.
While the loss function cannot realistically reach 0, we hope to minimize it, which means we want it to decrease according to the gradient.
The formula for gradient descent should be familiar to everyone; if not, I recommend looking up the relevant theory. Who enjoys looking at formulas? So I won’t elaborate here.
It’s your choice what your input is, but what can the neural network learn and decide?
Naturally, it can only determine the weights of each convolutional layer. Therefore, the neural network can only continuously modify the weights, like y = wx + b, where x is provided by you, and it can only change w and b to make the final output y as close as possible to your desired y value, thereby minimizing the loss.
If the partial derivative of loss with respect to input x approaches 0, doesn’t that mean reaching an extremum?
Given your loss calculation method is already defined, the decrease of the partial derivative of loss with respect to input x can only be achieved by updating the parameters of the convolution layer W (it can’t decide anything else; everything is input and provided by you).
Thus, the update for W is implemented as follows (note these numbers; we will refer to them later):
[1] First, calculate the partial derivative of loss with respect to input x (of course, there are several layers in the network, and this x refers to the input of each layer, not the initial input input).
[2] Multiply the result of [1] by a step size (this essentially gives a modification amount for the parameter W).
[3] Subtract this modification amount from W to complete one update of parameter W.
Though not very rigorous, that’s the general idea. You can manually implement this process, but for large-scale neural networks, how can you do it manually? That’s impossible. So we need to utilize the PyTorch framework and the toolbox torch.nn.
Therefore, we need to define the loss function. Taking MSELoss as an example:
compute_loss = nn.MSELoss()
Clearly, it is also a class and cannot be directly passed input data, so directly using loss = nn.MSELoss(target, output) is incorrect. We need to assign this function to an instance called compute_loss.
After that, you can pass your neural network’s output and the standard answer target into it:
loss = compute_loss(target, output)
Once you calculate the loss, the next step is backpropagation:
loss.backward()
This step essentially completes [1], obtaining an update amount for parameter W, which can be considered as one round of backpropagation.
Note here, what is loss.backward()? If it’s a custom-defined loss (for example, if you defined def loss(x, y): return y – x), then directly calling backward will definitely cause an error. Therefore, you should use the functions provided in nn.
Of course, in deep learning, it’s impossible to only use the officially provided loss functions, so if you want to use your own loss function,
you must also define loss in the same way as above Net (otherwise, your loss cannot backpropagate; this point is important; note: this point was previously written, in older versions it was not possible, but now it is generally no longer necessary). You also inherit from nn.Module, put the parameters into forward, calculate the specific loss in forward, and finally return the loss. Leave __init__() empty and just write super().__init__().
After backpropagation, how are [2] and [3] implemented? They are implemented through the optimizer. Let the optimizer automatically update the network weights W.
So after defining Net, you need to write a definition for the optimizer (taking the SGD method as an example):
from torch import optim
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Similarly, the optimizer is also a class; first, define an instance optimizer, which will be used later.
Note that when defining the optimizer, you need to pass net’s parameters to SGD, so that the optimizer can control the network parameters and modify them.
When passing, also include the learning rate lr.
Before each iteration, first, zero out the gradients stored in the optimizer (because the “update amount” for W that has already been updated is no longer needed next time).
optimizer.zero_grad()
After loss.backward() backpropagation, update the parameters:
optimizer.step()
So our sequence is:
1. First define the network: write the network Net class, declare the network instance net = Net(),
2. Define the optimizer
optimizer = optim.xxx(net.parameters(), lr=xxx),
3. Then define the loss function (either write your own class or use the official one, compute_loss = nn.MSELoss() or others).
4. After defining, start looping one by one:
① First clear the gradient information in the optimizer, optimizer.zero_grad();
② Then pass in the input, output = net(input), forward pass
③ Calculate the loss, loss = compute_loss(target, output) ## here target is the reference standard value GT, which needs to be prepared by you, corresponding to the previously passed input
④ Backpropagate the error, loss.backward()
⑤ Update parameters, optimizer.step()
This way, you have implemented a basic neural network. Most neural network training can be simplified to this process; it’s just that the content being passed in, the network definition, and the loss function may be more complex, and so on.
Thank you for pointing out any inaccuracies in what I’ve said!