Click on the above “Beginner Learning Vision”, select to add “Star” or “Top”

Essential resources delivered promptly

This is for academic sharing only and does not represent the views of this public account. Please contact us for removal if there is any infringement.

Reposted from Zhihu author丨z.defying@Zhihu

Source丨https://zhuanlan.zhihu.com/p/76459295

This article整理了13则PyTorch使用的小窍门，包括了指定GPU编号、梯度裁剪、扩展单张图片维度等实用技巧，能够帮助工作者更高效地完成任务。

1. Specify GPU Number

2. View Model Output Details for Each Layer

3. Gradient Clipping

4. Expand Dimensions of a Single Image

5. One Hot Encoding

6. Prevent Out of Memory During Model Validation

7. Learning Rate Decay

8. Freeze Parameters of Certain Layers

9. Use Different Learning Rates for Different Layers

10. Model Related Operations

11. Built-in One Hot Function in PyTorch

12. Network Parameter Initialization

13. Load Built-in Pre-trained Models

1. Specify GPU Number

Set the current GPU device to only device 0, with the device name /gpu:0: os.environ["CUDA_VISIBLE_DEVICES"] = "0"
Set the current GPU devices to 0 and 1, with names /gpu:0 and /gpu:1: os.environ["CUDA_VISIBLE_DEVICES"] = "0,1", indicating to first use device 0, then device 1.

The command to specify the GPU must be placed before a series of operations related to the neural network.

2. View Model Output Details for Each Layer

Keras has a concise API to view the output dimensions of each layer of the model, which is very useful for debugging networks. This functionality can now also be implemented in PyTorch.

It’s easy to use, as shown below:

from torchsummary import summarysummary(your_model, input_size=(channels, H, W))

input_size should be set according to the input size of your network model.

3. Gradient Clipping

import torch.nn as nn
outputs = model(data)
loss= loss_fn(outputs, target)
optimizer.zero_grad()
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=20, norm_type=2)
optimizer.step()

The parameters for nn.utils.clip_grad_norm_ are:

parameters – An iterator based on variables that will perform gradient normalization
max_norm – The maximum norm of the gradient
norm_type – Specifies the type of norm, default is L2

@不椭的椭圆 proposed: Gradient clipping may consume a lot of computation time on certain tasks, please check the comments for details.

4. Expand Dimensions of a Single Image

During training, the data dimensions are generally (batch_size, c, h, w), but when testing, only a single image is input, so the dimensions need to be expanded. There are multiple methods to expand dimensions:

import cv2
import torch
image = cv2.imread(img_path)
image = torch.tensor(image)
print(image.size())
img = image.view(1, *image.size())
print(img.size())
# output:
# torch.Size([h, w, c])
# torch.Size([1, h, w, c])

import cv2
import numpy as np
image = cv2.imread(img_path)
print(image.shape)
img = image[np.newaxis, :, :, :]
print(img.shape)
# output:
# (h, w, c)
# (1, h, w, c)

or (thanks to @coldleaf for the addition)

import cv2
import torch
image = cv2.imread(img_path)
image = torch.tensor(image)
print(image.size())
img = image.unsqueeze(dim=0)
print(img.size())
img = img.squeeze(dim=0)
print(img.size())
# output:
# torch.Size([(h, w, c)])
# torch.Size([1, h, w, c])
# torch.Size([h, w, c])

tensor.unsqueeze(dim): Expands the dimension, with dim specifying which dimension to expand.

tensor.squeeze(dim): Removes the dimension specified by dim that has size 1; if the dimension is greater than 1, squeeze() will not take effect. If dim is not specified, all dimensions with size 1 will be removed.

5. One Hot Encoding

When using the cross-entropy loss function in PyTorch, the label is automatically converted to one-hot, so there is no need for manual conversion. However, when using MSE, manual conversion to one-hot encoding is required.

import torch
class_num = 8
batch_size = 4
def one_hot(label):
    """
    Convert a one-dimensional list to one-hot encoding
    """
    label = label.resize_(batch_size, 1)
    m_zeros = torch.zeros(batch_size, class_num)
    # Get value from value, then assign to the corresponding position based on dim and index
    onehot = m_zeros.scatter_(1, label, 1)  # (dim,index,value)
    return onehot.numpy()  # Tensor -> Numpy
label = torch.LongTensor(batch_size).random_() % class_num  # Take modulo of random numbers
print(one_hot(label))
# output:[[0. 0. 0. 1. 0. 0. 0. 0.] [0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 1. 0. 0. 0. 0. 0.] [0. 1. 0. 0. 0. 0. 0. 0.]]

Note: There is a simpler method in item 11.

6. Prevent Out of Memory During Model Validation

During model validation, gradient calculation is not needed, so turning off autograd can speed up the process and save memory. If not turned off, it may lead to out of memory errors.

with torch.no_grad():    # Code for prediction using the model    pass

Thanks to @zhaz for the reminder, I updated the reason for using torch.cuda.empty_cache().

This is the original response:

Unused temporary variables during PyTorch training may accumulate, leading to out of memory. You can use the following statement to clean up these unnecessary variables.

The explanation on the official website is:

Releases all unoccupied cached memory currently held by the caching allocator so that those can be used in other GPU applications and visible in nvidia-smi. torch.cuda.empty_cache()

This means that PyTorch’s caching allocator pre-allocates some fixed GPU memory, even if the tensors do not actually use all of this memory, it cannot be used by other applications. This allocation process is triggered by the first CUDA memory access.

The role of torch.cuda.empty_cache() is to release the currently held but unoccupied cached GPU memory by the caching allocator, so that this memory can be used by other GPU applications, and it is visible through the nvidia-smi command. Note that using this command does not release the GPU memory occupied by tensors.

For unused data variables, PyTorch can automatically recycle them to free up the corresponding GPU memory.

For more detailed optimization, you can refer to optimizing GPU memory usage and GPU memory utilization issues.

7. Learning Rate Decay

import torch.optim as optim
from torch.optim import lr_scheduler
# Initialization before training
optimizer = optim.Adam(net.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, 10, 0.1)  # Every 10 epochs, multiply the learning rate by 0.1
# During training
for n in n_epoch:    scheduler.step()    ...

You can check the learning rate value at any time: optimizer.param_groups[0]['lr'].

There are also other ways to update the learning rate:

1. Custom update formula:

scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch:1/(epoch+1))

2. Update learning rate without relying on epoch:

lr_scheduler.ReduceLROnPlateau() provides a method to dynamically decrease the learning rate based on certain measurements during training, and its parameter descriptions can be found everywhere. One point to remind is that the parameter mode=’min’ or ‘max’ depends on whether you are optimizing loss or accuracy, i.e., using scheduler.step(loss) or scheduler.step(acc).

8. Freeze Parameters of Certain Layers

Reference: https://www.zhihu.com/question/311095447/answer/589307812

When loading a pre-trained model, we sometimes want to freeze the first few layers so that their parameters do not change during training.

We need to know the names of each layer, which can be printed with the following code:

net = Network()  # Get custom network structure
for name, value in net.named_parameters():    print('name: {0},	 grad: {1}'.format(name, value.requires_grad))

Assuming the information for the first few layers is as follows:

name: cnn.VGG_16.convolution1_1.weight,   grad: True
name: cnn.VGG_16.convolution1_1.bias,   grad: True
name: cnn.VGG_16.convolution1_2.weight,   grad: True
name: cnn.VGG_16.convolution1_2.bias,   grad: True
name: cnn.VGG_16.convolution2_1.weight,   grad: True
name: cnn.VGG_16.convolution2_1.bias,   grad: True
name: cnn.VGG_16.convolution2_2.weight,   grad: True
name: cnn.VGG_16.convolution2_2.bias,   grad: True

The True at the end indicates that the parameters of that layer are trainable, and then we define a list of layers to freeze:

no_grad = [    'cnn.VGG_16.convolution1_1.weight',    'cnn.VGG_16.convolution1_1.bias',    'cnn.VGG_16.convolution1_2.weight',    'cnn.VGG_16.convolution1_2.bias']

The method to freeze is as follows:

net = Net.CTPN()  # Get network structure
for name, value in net.named_parameters():    if name in no_grad:        value.requires_grad = False    else:        value.requires_grad = True

After freezing, we print the information for each layer again:

name: cnn.VGG_16.convolution1_1.weight,   grad: False
name: cnn.VGG_16.convolution1_1.bias,   grad: False
name: cnn.VGG_16.convolution1_2.weight,   grad: False
name: cnn.VGG_16.convolution1_2.bias,   grad: False
name: cnn.VGG_16.convolution2_1.weight,   grad: True
name: cnn.VGG_16.convolution2_1.bias,   grad: True
name: cnn.VGG_16.convolution2_2.weight,   grad: True
name: cnn.VGG_16.convolution2_2.bias,   grad: True

We can see that the weight and bias of the first two layers have requires_grad set to False, indicating they are not trainable.

Finally, when defining the optimizer, only the parameters of layers with requires_grad set to True will be updated.

optimizer = optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=0.01)

9. Use Different Learning Rates for Different Layers

We use different learning rates for different layers of the model.

Using this model as an example:

net = Network()  # Get custom network structure
for name, value in net.named_parameters():    print('name: {}'.format(name))
# Output:
# name: cnn.VGG_16.convolution1_1.weight
# name: cnn.VGG_16.convolution1_1.bias
# name: cnn.VGG_16.convolution1_2.weight
# name: cnn.VGG_16.convolution1_2.bias
# name: cnn.VGG_16.convolution2_1.weight
# name: cnn.VGG_16.convolution2_1.bias
# name: cnn.VGG_16.convolution2_2.weight
# name: cnn.VGG_16.convolution2_2.bias

Set different learning rates for convolution1 and convolution2, first separating them into different lists:

conv1_params = []
conv2_params = []
for name, parms in net.named_parameters():    if "convolution1" in name:        conv1_params += [parms]    else:        conv2_params += [parms]
# Then in the optimizer do the following:
optimizer = optim.Adam(    [        {"params": conv1_params, 'lr': 0.01},        {"params": conv2_params, 'lr': 0.001},    ],    weight_decay=1e-3,)

We divide the model into two parts, storing them in a list, with each part corresponding to the above dictionary, setting different learning rates in the dictionary. When these two parts have the same other parameters, the parameter is placed outside the list as a global parameter, such as the above weight_decay.

You can also set a global learning rate outside the list; when local learning rates are set in the dictionaries, the local learning rates will be used; otherwise, the global learning rate will be used.

10. Model Related Operations

This content is quite extensive, I wrote it as an article: https://zhuanlan.zhihu.com/p/73893187

11. Built-in One Hot Function in PyTorch

Thanks to @yangyangyang for the addition: After PyTorch 1.1, one-hot can be used directly with torch.nn.functional.one_hot.

Then I upgraded PyTorch to version 1.2 and tried the one-hot function, which is indeed very convenient.

The specific usage is as follows:

import torch.nn.functional as F
import torch
tensor =  torch.arange(0, 5) % 3  # tensor([0, 1, 2, 0, 1])
one_hot = F.one_hot(tensor)
# Output:
# tensor([[1, 0, 0],
#         [0, 1, 0],
#         [0, 0, 1],
#         [1, 0, 0],
#         [0, 1, 0]])

F.one_hot will automatically detect the number of different categories and generate the corresponding one-hot encoding. We can also specify the number of categories:

tensor =  torch.arange(0, 5) % 3  # tensor([0, 1, 2, 0, 1])
one_hot = F.one_hot(tensor, num_classes=5)
# Output:
# tensor([[1, 0, 0, 0, 0],
#         [0, 1, 0, 0, 0],
#         [0, 0, 1, 0, 0],
#         [1, 0, 0, 0, 0],
#         [0, 1, 0, 0, 0]])

Command to upgrade PyTorch (CPU version): conda install pytorch torchvision -c pytorch

(Hope upgrading PyTorch won’t affect project code)

12. Network Parameter Initialization

Initializing neural networks is an important foundational step in the training process, which can significantly affect the model’s performance, convergence, and convergence speed.

The following introduces two commonly used initialization operations.

(1) Use the built-in torch.nn.init method in PyTorch.

Common initialization operations, such as normal distribution, uniform distribution, xavier initialization, kaiming initialization, etc., have been implemented and can be used directly. For details, see the PyTorch documentation on torch.nn.init.

init.xavier_uniform(net1[0].weight)

(2) For more flexible initialization methods, numpy can be used.

For custom initialization methods, sometimes tensors are not as powerful and flexible as numpy, so you can use numpy to implement the initialization method and then convert it to tensor for use.

for layer in net1.modules():    if isinstance(layer, nn.Linear): # Check if it is a linear layer        param_shape = layer.weight.shape        layer.weight.data = torch.from_numpy(np.random.normal(0, 0.5, size=param_shape))         # Defined as normal distribution with mean 0 and variance 0.5

13. Load Built-in Pre-trained Models

The torchvision.models module contains the following models:

AlexNet
VGG
ResNet
SqueezeNet
DenseNet

The method to import these models is:

import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
vgg16 = models.vgg16()

A very important parameter is pretrained, which defaults to False, indicating that only the model structure is imported, and the weights are randomly initialized.

If pretrained is set to True, it indicates that the model pre-trained on the ImageNet dataset is imported.

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
vgg16 = models.vgg16(pretrained=True)

For more models, please refer to: https://pytorch-cn.readthedocs.io/zh/latest/torchvision/torchvision-models/

Good news!
The Beginner Learning Vision Knowledge Planet
is now open to the public👇👇👇





Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial
Reply: Extension Module Chinese Tutorial in the backend of the “Beginner Learning Vision” public account to download the first OpenCV extension module tutorial in Chinese on the internet, covering installation of extension modules, SFM algorithms, stereo vision, target tracking, biological vision, super-resolution processing, etc., with more than 20 chapters of content.

Download 2: 52 Lectures on Python Visual Practical Projects
Reply: Python Visual Practical Projects in the backend of the “Beginner Learning Vision” public account to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, facial recognition, etc., to help quickly learn computer vision.

Download 3: 20 Lectures on OpenCV Practical Projects
Reply: 20 Lectures on OpenCV Practical Projects in the backend of the “Beginner Learning Vision” public account to download 20 practical projects based on OpenCV for advanced learning of OpenCV.

Discussion Group

Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat number below to join the group, with remarks: “nickname + school/company + research direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format for remarks, otherwise, you will not be approved. After successful addition, you will be invited to the relevant WeChat group based on your research direction. Please do not send advertisements in the group, otherwise you will be removed from the group. Thank you for your understanding~

13 Tips for Using PyTorch Effectively

Table of Contents

1. Specify GPU Number

2. View Model Output Details for Each Layer

3. Gradient Clipping

4. Expand Dimensions of a Single Image

5. One Hot Encoding

6. Prevent Out of Memory During Model Validation

7. Learning Rate Decay

8. Freeze Parameters of Certain Layers

9. Use Different Learning Rates for Different Layers

10. Model Related Operations

11. Built-in One Hot Function in PyTorch

12. Network Parameter Initialization

Initializing neural networks is an important foundational step in the training process, which can significantly affect the model’s performance, convergence, and convergence speed.

13. Load Built-in Pre-trained Models

Leave a Comment Cancel reply