Introduction

The original text is translated from: Deep Learning with PyTorch: A 60 Minute Blitz

Translated by: Lin Buqing（https://www.zhihu.com/people/lu-guo-92-42-88）

Lin Buqing (Lin Yanyan) is my student. This is what I asked her to study when she started learning PyTorch, and she translated this tutorial into Chinese to share with everyone. I hope it helps you all.

Code download link:

https://github.com/fengdu78/machine_learning_beginner/tree/master/PyTorch_beginner

60-Minute Introduction to PyTorch (1) — Tensors

60-Minute Introduction to PyTorch (2) — Autograd

60-Minute Introduction to PyTorch (3) — Neural Networks

60-Minute Introduction to PyTorch (4) — Training a Classifier

(1) Tensors

Tensors are a special data structure that is very similar to arrays and matrices. In PyTorch, we use tensors to encode inputs, outputs, and parameters of the model. Besides the fact that tensors can run on GPUs or other specialized hardware to accelerate computation, their other uses are similar to ndarrays in Numpy. If you are familiar with ndarrays, you will find the tensor API familiar. If not, please follow this tutorial to quickly get acquainted with the API.

%matplotlib inline
import torch
import numpy as np

Initializing Tensors

There are several ways to create a tensor, such as:

Creating Directly from Data

You can create a tensor directly from data, and the data type will be inferred automatically.

data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

Creating from Numpy

A tensor can be created directly from a Numpy array (and vice versa – see bridge-to-np-label).

np_array = np.array(data)
x_np = torch.from_numpy(np_array)

Creating from Other Tensors

A new tensor retains some properties (shape, data type) of the parameter tensor unless explicitly overridden.

x_ones = torch.ones_like(x_data)  # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")
x_rand = torch.rand_like(x_data, dtype=torch.float)  # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")
Ones Tensor:  tensor([[1, 1],        [1, 1]]) 
Random Tensor:  tensor([[0.6075, 0.4581],        [0.5631, 0.1357]])

Creating from Constants or Random Numbers

The shape is a tuple about the dimensions of the tensor, which determines the output tensor’s dimensions in the functions below.

shape = (2,3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)
print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")
Random Tensor:  tensor([[0.7488, 0.0891, 0.8417],        [0.0783, 0.5984, 0.5709]]) 
Ones Tensor:  tensor([[1., 1., 1.],        [1., 1., 1.]]) 
Zeros Tensor:  tensor([[0., 0., 0.],        [0., 0., 0.]])

Properties of Tensors

The properties of a tensor include shape, data type, and the device it is stored on.

tensor = torch.rand(3,4)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu

Operations on Tensors

Tensors have over 100 operations, including transposing, indexing, slicing, mathematical operations, linear algebra, random sampling,

All of these can run on the GPU (which is usually faster than the CPU). If you are using Colab, you can allocate a GPU by editing > Notebook settings.

# We move our tensor to the GPU if available
if torch.cuda.is_available():  tensor = tensor.to('cuda')

Try some operations from the list. If you are familiar with the NumPy API, you will find the tensor API easy to use.

Standard Numpy-like Indexing and Slicing:

tensor = torch.ones(4, 4)
tensor[:,1] = 0
print(tensor)
tensor([[1., 0., 1., 1.],        [1., 0., 1., 1.],        [1., 0., 1., 1.],        [1., 0., 1., 1.]])

Concatenating Tensors

You can use torch.cat to concatenate a series of tensors along a specific dimension. torch.stack is another join operation that has subtle differences from torch.cat.

t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)
tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])

Adding Tensors

# This computes the element-wise product
print(f"tensor.mul(tensor) \n {tensor.mul(tensor)} \n")
# Alternative syntax:
print(f"tensor * tensor \n {tensor * tensor}")
tensor.mul(tensor)  tensor([[1., 0., 1., 1.],        [1., 0., 1., 1.],        [1., 0., 1., 1.],        [1., 0., 1., 1.]]) 
tensor * tensor  tensor([[1., 0., 1., 1.],        [1., 0., 1., 1.],        [1., 0., 1., 1.],        [1., 0., 1., 1.]])

It computes the matrix multiplication between two tensors.

print(f"tensor.matmul(tensor.T) \n {tensor.matmul(tensor.T)} \n")
# Alternative syntax:
print(f"tensor @ tensor.T \n {tensor @ tensor.T}")
tensor.matmul(tensor.T)  tensor([[3., 3., 3., 3.],        [3., 3., 3., 3.],        [3., 3., 3., 3.],        [3., 3., 3., 3.]]) 
tensor @ tensor.T  tensor([[3., 3., 3., 3.],        [3., 3., 3., 3.],        [3., 3., 3., 3.],        [3., 3., 3., 3.]])

In-Place Operations

Operations with a suffix _ denote in-place operations, for example: x.copy_(y), x.t_() will change x.

print(tensor, "\n")
tensor.add_(5)
print(tensor)
tensor([[1., 0., 1., 1.],        [1., 0., 1., 1.],        [1., 0., 1., 1.],        [1., 0., 1., 1.]]) 
tensor([[6., 5., 6., 6.],        [6., 5., 6., 6.],        [6., 5., 6., 6.],        [6., 5., 6., 6.]])

Note

In-place operations can save a lot of space, but since they immediately clear the history, they may cause issues when computing derivatives, so they are not recommended.

Converting Tensors to NumPy Arrays

t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")
t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]

The changes in the tensor reflect in the NumPy array.

t.add_(1)
print(f"t: {t}")
print(f"n: {n}")
t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]

Converting NumPy Arrays to Tensors

n = np.ones(5)
t = torch.from_numpy(n)

Changes in the NumPy array reflect in the tensor.

np.add(n, 1, out=n)

(2) Autograd: Automatic Differentiation

torch.autograd is the automatic differentiation tool in PyTorch and is at the core of all neural networks. First, let’s briefly understand how this package trains neural networks.

Background Introduction

Neural networks (NNs) are a collection of nested functions acting on input data, defined by weights and errors stored in tensors in PyTorch.

There are two steps in training a neural network:

Forward Propagation: In forward propagation, the neural network makes the best prediction of the correct output by computing the received data with the corresponding weights and errors for each layer.

Backward Propagation: In backward propagation, the neural network adjusts its parameters in proportion to the output error. Backward propagation is based on the gradient descent strategy and is an application of the chain rule of differentiation, adjusting parameters in the direction of the negative gradient of the target.

For a more detailed introduction, please refer to the following address:

[3Blue1Brown]:

https://www.youtube.com/watch?v=tIeHLnjs5U8

PyTorch Application

Let’s look at a simple example. We load a pre-trained resnet18 model from torchvision, then create a random data tensor to represent an image with 3 channels, height, and width of 64, with corresponding labels initialized to some random values.

%matplotlib inline
import torch, torchvision
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

Next, we will propagate the input data towards the output through each layer of the model to predict the output, which is forward propagation.

prediction = model(data)  # Forward propagation

We calculate the error using the model’s predicted output and the corresponding weights, then backpropagate the error. After the computation is complete, you can call .backward() to automatically compute all gradients. The gradients of this tensor will accumulate in the .grad attribute.

loss = (prediction - labels).sum()
loss.backward()  # Backward propagation

Next, we load an optimizer; in this example, the learning rate of SGD is 0.01, and the momentum is 0.9. We register all parameters of the model in the optimizer.

optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

Finally, we call .step() to perform gradient descent, and the optimizer adjusts each parameter based on the gradients stored in .grad.

optim.step()  # Gradient descent

Now, you have all the conditions needed to train a neural network. The following sections detail how the Autograd package works — you can skip them.

Derivatives in Autograd

Let’s first see how autograd collects gradients. We create two tensors a and b and set requires_grad = True to track its computation.

import torch
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

Next, we create tensor Q based on a and b.

Q = 3*a**3 - b**2

Assuming a and b are weights of a neural network, Q is its error, in neural network training, we need the error gradient w.r.t parameters, i.e.

When we call Q.backward(), autograd computes these gradients and stores them in the .grad attribute of the tensor. We need to explicitly pass the gradient in Q.backward(), which is a tensor of the same shape as Q, representing the gradient of Q w.r.t itself, i.e.

Similarly, we can aggregate Q into a scalar and call backward implicitly, like Q.sum().backward().

external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

Now the gradients are stored in a.grad and b.grad.

# Check if the stored gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)

Optional Reading — Vector Computation with Autograd

Mathematically, if you have a vector-valued function 𝑦⃗ =𝑓(𝑥⃗ ), then the gradient of 𝑦⃗ with respect to 𝑥⃗ is the Jacobian matrix:

In general, torch.autograd is an engine for computing Jacobian-vector products. That is, given any vector 𝑣=(𝑣1𝑣2…𝑣𝑚)𝑇, compute the product 𝐽⋅𝑣. If 𝑣 happens to be the gradient of a scalar function 𝑙=𝑔(𝑦⃗ ), i.e.

Downloadable: 60-Minute Introduction to PyTorch (Full Translation)

Then, according to the chain rule, the Jacobian-vector product will be the gradient of 𝑙 with respect to 𝑥⃗.

This property of Jacobian-vector products makes it very convenient to feed external gradients into models with non-scalar outputs. The external_grad substitutes

Graph Computation

Conceptually, autograd records data (tensor) and all operations performed (and the new tensors produced) in a directed acyclic graph (DAG) composed of function objects. In this DAG, the leaf nodes are the input data, and the root nodes are the output data. By tracing this graph from the root nodes to the leaf nodes, you can automatically compute gradients using the chain rule.

During forward propagation, autograd does two things simultaneously:

Runs the requested operations to compute the resulting tensor
Keeps track of the gradients of operations in the DAG

During backward propagation, when .backward() is called on the root node of the DAG, backward propagation starts, and autograd then completes:

Calculating the gradients of each .grad_fn
Accumulating them into the .grad attribute of each respective tensor
Using the chain rule to propagate all the way back to the leaf nodes

Below is an example of a visual representation of the DAG. In the graph, arrows indicate the direction of forward propagation, and nodes represent the backward functions of each operation in the forward pass. The blue-marked leaf nodes represent the leaf tensors a and b.

Note

The DAG in PyTorch is dynamic. It is worth noting that the graph is recreated from scratch; after each call to “.backward()“, autograd starts filling a new graph, which is why it is possible to use control flow statements in the model. You can change shapes, sizes, and operations in each iteration as needed.

torch.autograd tracks all operations related to tensors with requires_grad set to True. For tensors that do not require gradients, setting this attribute to False excludes them from the gradient computation DAG. The output tensor of an operation will require gradients even if only one input tensor has requires_grad=True.

x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5, 5), requires_grad=True)
a = x + y
print(f"Does `a` require gradients? : {a.requires_grad}")
b = x + z
print(f"Does `b` require gradients?: {b.requires_grad}")

In neural networks, parameters that do not compute gradients are often referred to as frozen parameters. If you know in advance that you do not need the gradients of these parameters, freezing parts of the model is useful (this brings some performance benefits by reducing autograd computation). Another common use case is fine-tuning a pre-trained network, during which we freeze most of the model — usually, only modifying the classifier to make predictions for new values. Let’s demonstrate this with a small example. As before, we load a pre-trained resnet18 model and freeze all parameters.

from torch import nn, optim
model = torchvision.models.resnet18(pretrained=True)
# Freeze all parameters in the network
for param in model.parameters():    param.requires_grad = False

Assuming we want to fine-tune the model on a new dataset with 10 labels. In resnet, the classifier is the last linear layer model.model.fc. We can simply replace it with a new linear layer (which is not frozen by default) as our classifier.

model.fc = nn.Linear(512, 10)

Now, except for the parameters of model.fc, all other parameters of the model are frozen, and the parameters that participate in the computation are the weights and biases of model.fc.

# Only optimize the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)

Note that although we registered all parameters in the optimizer, the only parameters participating in gradient computation (and thus updated in gradient descent) are the weights and biases of the classifier. torch.no_grad() has the same effect.

Further Reading

[In-place Modification Operations and Multithreaded Autograd]:

(https://pytorch.org/docs/stable/notes/autograd.html)
[Examples of Backward Mode Autodiff]:

(https://colab.research.google.com/drive/1VpeE6UvEPRz9HmsHh1KS0XxXjYu533EC)

(3) Neural Networks

You can use the torch.nn package to build neural networks. You already know that the autograd package relies on the nn package to define models and compute gradients. An nn.Module contains various layers and a forward(input) method that returns output.

For example, let’s look at the following network that classifies digit images.

It is a simple feedforward neural network that receives an input and processes it layer by layer until it finally outputs a result.

The typical training process for a neural network is as follows:

Define a neural network model that has some learnable parameters (or weights);
Iterate over the dataset;
Process the input through the neural network;
Calculate the loss (the gap between the output and the correct value)
Backpropagate the gradients to the network’s parameters;
Update the network’s parameters, mainly using the following simple update rule: weight = weight – learning_rate * gradient

Defining the Network

Let’s first define a network:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):        super(Net, self).__init__()        # 1 input image channel, 6 output channels, 3x3 square convolution        # kernel        self.conv1 = nn.Conv2d(1, 6, 3)        self.conv2 = nn.Conv2d(6, 16, 3)        # an affine operation: y = Wx + b        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension         self.fc2 = nn.Linear(120, 84)        self.fc3 = nn.Linear(84, 10)
    def forward(self, x):        # Max pooling over a (2, 2) window        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))        # If the size is a square you can only specify a single number        x = F.max_pool2d(F.relu(self.conv2(x)), 2)        x = x.view(-1, self.num_flat_features(x))        x = F.relu(self.fc1(x))        x = F.relu(self.fc2(x))        x = self.fc3(x)        return x
    def num_flat_features(self, x):        size = x.size()[1:]  # all dimensions except the batch dimension        num_features = 1        for s in size:            num_features *= s        return num_features

net = Net()
print(net)
Net(  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))  (fc1): Linear(in_features=576, out_features=120, bias=True)  (fc2): Linear(in_features=120, out_features=84, bias=True)  (fc3): Linear(in_features=84, out_features=10, bias=True))

You only need to define the forward function; the backward function (which computes gradients) is automatically created for you when using autograd. You can use any operation of the Tensor in the forward function.

net.parameters() returns the parameters that the model needs to learn.

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight
10
torch.Size([6, 1, 3, 3])

Construct a random input of size 32×32. Note: this network (LeNet) expects an input size of 32×32. If you are using the MNIST dataset to train this network, resize the images to 32*32.

input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
tensor([[-0.0765,  0.0522,  0.0820,  0.0109,  0.0004,  0.0184,  0.1024,  0.0509,          0.0917, -0.0164]], grad_fn=&lt;AddmmBackward&gt;)

Clear the gradient cache of all parameters and then perform backpropagation of random gradients.

net.zero_grad()
out.backward(torch.randn(1, 10))

Note

“torch.nn“ only supports mini-batch inputs; the entire torch.nn package only supports mini-batch samples, not single samples. For example, “nn.Conv2d“ will accept a 4D tensor, where each dimension is (batch size * channels * height * width). If you have a single sample, just use `input.unsqueeze(0)` to add the other dimensions. Before continuing, let’s review all the classes we’ve seen so far.

Review

torch.Tensor – a multi-dimensional array that supports automatic programming operations (like backward()). It is a tensor that retains gradients.
nn.Module – neural network module. Encapsulates parameters, runs on GPU, exports, loads, etc.
nn.Parameter – a tensor that is automatically registered as a parameter when assigned to a Module.
autograd.Function – implements a forward and backward definition of an automatic differentiation operation. Each tensor operation creates at least one Function node that connects to the function that created the tensor and encodes its history.

Now, we have covered the following:

Defining a neural network
Processing input and calling backward

The remaining content:

Calculating loss values
Updating the weights of the neural network

Loss Function

A loss function takes a pair (output, target) as input (output is the network’s output, target is the actual value) and computes a value to estimate how far the network’s output is from the target value.

There are several different loss functions in the nn package. A simple loss function is: nn.MSELoss, which computes the mean squared error between the input and the target.

For example:

output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
tensor(1.5801, grad_fn=&lt;MseLossBackward&gt;)

Now, you can backtrack the loss using its .grad_fn attribute, and you will see a computation graph like the one below:

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d -> view -> linear -> relu -> linear -> relu -> linear -> MSELoss -> loss

So, when you call loss.backward(), the entire graph is differentiated for the loss and all tensors in the graph with requires_grad = True, and their .grad tensors accumulate the gradients.

To illustrate, we will backtrack a few steps:

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU
&lt;MseLossBackward object at 0x0000023193A40E08&gt;&lt;AddmmBackward object at 0x0000023193A40E48&gt;&lt;AccumulateGrad object at 0x0000023193A40E08&gt;

Backpropagation

To backpropagate the error, all we need to do is call loss.backward(). You need to clear existing gradients; otherwise, the gradients will accumulate onto existing gradients.

Now, we will call loss.backward() and check the gradients of the bias term in conv1 layer before and after backpropagation.

net.zero_grad()     # zeroes the gradient buffers of all parameters
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)
loss.backward()
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)
conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([ 0.0013,  0.0068,  0.0096,  0.0039, -0.0105, -0.0016])

Now we know how to use the loss function.

The only remaining content:

Updating the weights of the network

Updating Weights

The simplest update rule in practice is stochastic gradient descent (SGD).

weight = weight - learning_rate * gradient

We can implement this rule using simple Python code.

learning_rate = 0.01
for f in net.parameters():    f.data.sub_(f.grad.data * learning_rate)

However, when using neural networks, you want to use various update rules, such as SGD, Nesterov-SGD, Adam, RMSPROP, etc. To do this, we build a package called torch.optim that implements all of these rules. Using them is very simple:

import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

Note

Observe how to use optimizer.zero_grad() to manually set the gradient buffer to zero. This is because gradients are accumulated as indicated in the backward propagation part.

(4) Training a Classifier

You have learned how to define a neural network, compute loss values, and update the network’s weights.

You might now be wondering: where does the data come from?

About Data

Typically, when dealing with image, text, audio, and video data, you can use standard Python packages to load the data into a numpy array and then convert that array into a torch.*Tensor.

For images, there are very useful packages like Pillow, OpenCV, etc.
For audio, there are packages like scipy and librosa.
For text, you can load using raw Python and Cython, or use NLTK and SpaCy. For vision, we created a torchvision package that includes data loading for common datasets like Imagenet, CIFAR10, MNIST, etc., and image transformers, namely torchvision.datasets and torch.utils.data.DataLoader.

This provides great convenience and avoids code duplication.

In this tutorial, we will use the CIFAR10 dataset, which has the following 10 categories:

’airplane’, ’automobile’, ’bird’, ’cat’, ’deer’, ’dog’, ’frog’, ’horse’, ’ship’, ’truck’. The images in this dataset are of size 3*32*32, i.e., 3 channels, 32*32 pixels.

Training an Image Classifier

We will proceed in the following order:

Load and normalize the CIFAR10 training and test sets using torchvision.
Define a convolutional neural network.
Define the loss function.
Train the network on the training set.
Test the network on the test set.

1. Loading and Normalizing CIFAR10

It is very easy to load CIFAR10 using torchvision.

%matplotlib inline
import torch
import torchvision
import torchvision.transforms as transforms

The output of torchvision is PILImage in the range [0,1], we convert it to a tensor with normalization in the range [-1, 1].

Note

If you encounter a BrokenPipeError while running on Windows, try setting num_worker to 0 in torch.utils.data.DataLoader().

transform = transforms.Compose(    [transforms.ToTensor(),     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,                                          shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,                                         shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')# This process is a bit slow and will download about 340mb of image data.

We display some interesting training images.

import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):    img = img / 2 + 0.5     # unnormalize    npimg = img.numpy()    plt.imshow(np.transpose(npimg, (1, 2, 0)))    plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

2. Defining a Convolutional Neural Network

Copy the neural network code from the previous section and modify it to accept 3-channel images instead of the previously accepted single-channel images.

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):    def __init__(self):        super(Net, self).__init__()        self.conv1 = nn.Conv2d(3, 6, 5)        self.pool = nn.MaxPool2d(2, 2)        self.conv2 = nn.Conv2d(6, 16, 5)        self.fc1 = nn.Linear(16 * 5 * 5, 120)        self.fc2 = nn.Linear(120, 84)        self.fc3 = nn.Linear(84, 10)
    def forward(self, x):        x = self.pool(F.relu(self.conv1(x)))        x = self.pool(F.relu(self.conv2(x)))        x = x.view(-1, 16 * 5 * 5)        x = F.relu(self.fc1(x))        x = F.relu(self.fc2(x))        x = self.fc3(x)        return x

net = Net()

3. Defining the Loss Function and Optimizer

We use cross-entropy as the loss function and stochastic gradient descent with momentum as the optimizer.

import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

4. Training the Network

This is the moment we have been waiting for. We simply loop over the data iterator, feed the data into the network, and optimize.

for epoch in range(2):  # loop over the dataset multiple times
    running_loss = 0.0    for i, data in enumerate(trainloader, 0):        # get the inputs; data is a list of [inputs, labels]        inputs, labels = data
        # zero the parameter gradients        optimizer.zero_grad()
        # forward + backward + optimize        outputs = net(inputs)        loss = criterion(outputs, labels)        loss.backward()        optimizer.step()
        # print statistics        running_loss += loss.item()        if i % 2000 == 1999:    # print every 2000 mini-batches            print('[%d, %5d] loss: %.3f' %                  (epoch + 1, i + 1, running_loss / 2000))            running_loss = 0.0
print('Finished Training')

Save our trained model.

PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

5. Testing the Network on the Test Set

We trained the network twice on the entire training set, but we still need to check if the network has learned anything from the dataset.

We do this by predicting the class labels outputted by the neural network and checking against the actual labels. If the prediction is correct, we add that sample to the correct prediction list.

First, let’s display some images from the test set to familiarize ourselves with the content of the images.

dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

Next, let’s reload our saved model (note that saving and reloading the model here is not necessary; we just illustrate how to do it):

net = Net()
net.load_state_dict(torch.load(PATH))

Now, let’s see what the neural network thinks about the above images?

outputs = net(images)

The output is the probabilities for the 10 labels. The higher the probability of a class, the more the neural network believes it belongs to that class. So let’s get the label with the highest probability.

_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]                              for j in range(4)))

This result looks very good.

Next, let’s see how the network performs on the entire test set.

correct = 0
total = 0
with torch.no_grad():    for data in testloader:        images, labels = data        outputs = net(images)        _, predicted = torch.max(outputs.data, 1)        total += labels.size(0)        correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (    100 * correct / total))

The result looks better than random chance; the random accuracy is 10%, which indicates that the network seems to have learned something.

So which classes does it predict well, and which classes does it predict poorly?

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():    for data in testloader:        images, labels = data        outputs = net(images)        _, predicted = torch.max(outputs, 1)        c = (predicted == labels).squeeze()        for i in range(4):            label = labels[i]            class_correct[label] += c[i].item()            class_total[label] += 1
for i in range(10):    print('Accuracy of %5s : %2d %%' % (        classes[i], 100 * class_correct[i] / class_total[i]))

What to do next?

How do we run the neural network on the GPU?

Training on GPU

You move a Tensor to the GPU in the same way you move a neural network to the GPU for training. This operation recursively traverses all modules and converts their parameters and buffers to CUDA tensors.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")# Assume that we are on a CUDA machine, then this should print a CUDA device:#假设我们有一台CUDA的机器，这个操作将显示CUDA设备。print(device)

Next, assuming we have a CUDA machine, these methods will recursively traverse all modules and convert their parameters and buffers to CUDA tensors:

net.to(device)

Please remember that you must also convert your inputs and targets to the GPU at every step:

inputs, labels = inputs.to(device), labels.to(device)

Why didn’t we notice a significant speedup from the GPU? That’s because the network is very small.

Practice:

Try increasing the width of your network (the second parameter of the first nn.Conv2d and the first parameter of the second nn.Conv2d, they need to be the same number) to see what kind of speedup you get.

Goals Achieved:

Gained deeper understanding of PyTorch’s tensor library and neural networks
Trained a small network to classify images

Training on Multiple GPUs

If you want to use all GPUs to speed things up even more, check out the further reading: [Data Parallelism]:

(https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html)

What to do next?

Train a neural network to play video games
Train the best ResNet on ImageNet
Use generative adversarial networks to train a face generator
Train a character-level language model using LSTM networks
More examples
More tutorials
Discuss PyTorch on forums
Chat with other users on Slack

Code download link:

https://github.com/fengdu78/machine_learning_beginner/tree/master/PyTorch_beginner

Editor: Yu Tengkai

Proofreader: Lin Yilin

Downloadable: 60-Minute Introduction to PyTorch (Full Translation)

Introduction

Table of Contents

(1) Tensors

Initializing Tensors

Creating Directly from Data

Creating from Numpy

Creating from Other Tensors

Properties of Tensors

Operations on Tensors

Standard Numpy-like Indexing and Slicing:

Concatenating Tensors

Adding Tensors

In-Place Operations

Note

Converting Tensors to NumPy Arrays

Converting NumPy Arrays to Tensors

(2) Autograd: Automatic Differentiation

torch.autograd is the automatic differentiation tool in PyTorch and is at the core of all neural networks. First, let’s briefly understand how this package trains neural networks.

Note

(3) Neural Networks

Defining the Network

Note

Review

Now, we have covered the following:

The remaining content:

Loss Function

Backpropagation

Further Reading:

The only remaining content:

Updating Weights

(4) Training a Classifier

You have learned how to define a neural network, compute loss values, and update the network’s weights.

About Data

Training an Image Classifier

1. Loading and Normalizing CIFAR10

Note

2. Defining a Convolutional Neural Network

3. Defining the Loss Function and Optimizer

4. Training the Network

5. Testing the Network on the Test Set

Training on GPU

Practice:

Goals Achieved:

Training on Multiple GPUs

What to do next?

Leave a Comment Cancel reply