Click the above “CVer” to choose to add “Star” or “Pin”
Essential content delivered promptly
Author: z.defying
https://zhuanlan.zhihu.com/p/76459295
This article is authorized by the author, and reprinting is not allowed without permission
Table of Contents:
-
Specify GPU ID
-
View details of each layer output of the model
-
Gradient Clipping
-
Expand the dimension of a single image
-
One-Hot Encoding
-
Prevent out of memory errors when validating the model
-
Learning Rate Decay
-
Freeze parameters of certain layers
-
Use different learning rates for different layers
1. Specify GPU ID
-
Set the current GPU device to only use device 0, device name is
/gpu:0
:os.environ["CUDA_VISIBLE_DEVICES"] = "0"
-
Set the current GPU device to use both devices 0 and 1, names are
/gpu:0
and/gpu:1
:os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
, indicating to prioritize device 0 first, then use device 1.
The command to specify GPU should be placed before a series of operations related to the neural network.
2. View details of each layer output of the model
Keras has a concise API to view the output size of each layer of the model, which is very useful for debugging the network. This functionality can now also be achieved in PyTorch.
It is very simple to use, as shown below:
from torchsummary import summary
summary(your_model, input_size=(channels, H, W))
input_size
should be set according to the input size of your own network model.
https://github.com/sksq96/pytorch-summary
3. Gradient Clipping
import torch.nn as nn
outputs = model(data)
loss= loss_fn(outputs, target)
optimizer.zero_grad()
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=20, norm_type=2)
optimizer.step()
The parameters of nn.utils.clip_grad_norm_
:
-
parameters – An iterator based on variables, which will perform gradient normalization
-
max_norm – The maximum norm of the gradient
-
norm_type – Specifies the type of norm, default is L2
4. Expand the dimension of a single image
Since the data dimensions during training are generally (batch_size, c, h, w), but only one image is input during testing, it is necessary to expand the dimensions. There are several methods to expand dimensions:
import cv2
import torch
image = cv2.imread(img_path)
image = torch.tensor(image)
print(image.size())
img = image.view(1, *image.size())
print(img.size())
# output:
# torch.Size([h, w, c])
# torch.Size([1, h, w, c])
Or
import cv2
import numpy as np
image = cv2.imread(img_path)
print(image.shape)
img = image[np.newaxis, :, :, :]
print(img.shape)
# output:
# (h, w, c)
# (1, h, w, c)
Or (thanks to Zhihu user coldleaf
for the addition)
import cv2
import torch
image = cv2.imread(img_path)
image = torch.tensor(image)
print(image.size())
img = image.unsqueeze(dim=0)
print(img.size())
img = img.squeeze(dim=0)
print(img.size())
# output:
# torch.Size([(h, w, c)])
# torch.Size([1, h, w, c])
# torch.Size([h, w, c])
tensor.unsqueeze(dim)
: Expand the dimension, dim
specifies which dimension to expand.
tensor.squeeze(dim)
: Remove the dimension specified by dim
that has size 1; if the size is greater than 1, squeeze()
has no effect, and when dim
is not specified, it removes all dimensions of size 1.
5. One-Hot Encoding
When using the cross-entropy loss function in PyTorch, the label is automatically converted to one-hot, so there is no need to convert it manually, while using MSE requires manual conversion to one-hot encoding.
import torch
class_num = 8
batch_size = 4
def one_hot(label):
"""
Convert a one-dimensional list to one-hot encoding
"""
label = label.resize_(batch_size, 1)
m_zeros = torch.zeros(batch_size, class_num)
# Take values from value, and assign to the corresponding position based on dim and index
onehot = m_zeros.scatter_(1, label, 1) # (dim,index,value)
return onehot.numpy() # Tensor -> Numpy
label = torch.LongTensor(batch_size).random_() % class_num # Take remainder of random numbers
print(one_hot(label))
# output:
[[0. 0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0.]]
https://discuss.pytorch.org/t/convert-int-into-one-hot-format/507/3
6. Prevent Out of Memory Errors When Validating the Model
When validating the model, there is no need to compute gradients, so turning off autograd can improve speed and save memory. If not turned off, it may cause out of memory errors.
with torch.no_grad():
# Code for making predictions using the model
pass
Thanks to Zhihu user zhaz
for the reminder, I updated the reason for using torch.cuda.empty_cache()
.
This was the original answer:
Pytorch’s unnecessary temporary variables during training may increase, leading to
out of memory
, and the following statement can be used to clean up these unnecessary variables.
The explanation on the official website is:
Releases all unoccupied cached memory currently held by the caching allocator so that those can be used in other GPU application and visible in nvidia-smi.
torch.cuda.empty_cache()
This means that PyTorch’s caching allocator will pre-allocate some fixed memory, even if tensors have not actually used all of this memory, this memory cannot be used by other applications. This allocation process is triggered by the first CUDA memory access.
And the role of torch.cuda.empty_cache()
is to release the currently held and unoccupied cached memory by the caching allocator so that this memory can be used by other GPU applications and is visible through the nvidia-smi
command. Note that using this command will not release the memory occupied by tensors.
For unused data variables, Pytorch can automatically recycle and release the corresponding memory.
For more detailed optimizations, see Optimizing Memory Usage and Memory Utilization Issues.
7. Learning Rate Decay
import torch.optim as optim
from torch.optim import lr_scheduler
# Initialization before training
optimizer = optim.Adam(net.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, 10, 0.1) # Every 10 epochs, multiply the learning rate by 0.1
# During training
for n in n_epoch:
scheduler.step()
...
8. Freeze Parameters of Certain Layers
Reference: Freezing a Certain Layer of a Pre-trained Model in Pytorch
When loading a pre-trained model, sometimes we want to freeze the first few layers so that their parameters do not change during training.
We first need to know the names of each layer, which can be printed with the following code:
net = Network() # Get custom network structure
for name, value in net.named_parameters():
print('name: {0}, grad: {1}'.format(name, value.requires_grad))
Assuming the information of the first few layers is as follows:
name: cnn.VGG_16.convolution1_1.weight, grad: True
name: cnn.VGG_16.convolution1_1.bias, grad: True
name: cnn.VGG_16.convolution1_2.weight, grad: True
name: cnn.VGG_16.convolution1_2.bias, grad: True
name: cnn.VGG_16.convolution2_1.weight, grad: True
name: cnn.VGG_16.convolution2_1.bias, grad: True
name: cnn.VGG_16.convolution2_2.weight, grad: True
name: cnn.VGG_16.convolution2_2.bias, grad: True
The True at the end indicates that the parameters of this layer are trainable, then we define a list of layers to be frozen:
no_grad = [
'cnn.VGG_16.convolution1_1.weight',
'cnn.VGG_16.convolution1_1.bias',
'cnn.VGG_16.convolution1_2.weight',
'cnn.VGG_16.convolution1_2.bias'
]
The freezing method is as follows:
net = Net.CTPN() # Get network structure
for name, value in net.named_parameters():
if name in no_grad:
value.requires_grad = False
else:
value.requires_grad = True
After freezing, we print the information of each layer again:
name: cnn.VGG_16.convolution1_1.weight, grad: False
name: cnn.VGG_16.convolution1_1.bias, grad: False
name: cnn.VGG_16.convolution1_2.weight, grad: False
name: cnn.VGG_16.convolution1_2.bias, grad: False
name: cnn.VGG_16.convolution2_1.weight, grad: True
name: cnn.VGG_16.convolution2_1.bias, grad: True
name: cnn.VGG_16.convolution2_2.weight, grad: True
name: cnn.VGG_16.convolution2_2.bias, grad: True
It can be seen that the weight and bias of the first two layers have their requires_grad
set to False, indicating that they are not trainable.
Finally, when defining the optimizer, only the parameters of layers where requires_grad
is True are updated.
optimizer = optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=0.01)
9. Use Different Learning Rates for Different Layers
We will use different learning rates for different layers of the model.
Still using this model as an example:
net = Network() # Get custom network structure
for name, value in net.named_parameters():
print('name: {}'.format(name))
# Output:
# name: cnn.VGG_16.convolution1_1.weight
# name: cnn.VGG_16.convolution1_1.bias
# name: cnn.VGG_16.convolution1_2.weight
# name: cnn.VGG_16.convolution1_2.bias
# name: cnn.VGG_16.convolution2_1.weight
# name: cnn.VGG_16.convolution2_1.bias
# name: cnn.VGG_16.convolution2_2.weight
# name: cnn.VGG_16.convolution2_2.bias
Set different learning rates for convolution1 and convolution2, first separate them into different lists:
conv1_params = []
conv2_params = []
for name, parms in net.named_parameters():
if "convolution1" in name:
conv1_params += [parms]
else:
conv2_params += [parms]
# Then perform the following operation in the optimizer:
optimizer = optim.Adam(
[
{"params": conv1_params, 'lr': 0.01},
{"params": conv2_params, 'lr': 0.001},
],
weight_decay=1e-3,
)
We divide the model into two parts, placed in a list, each part corresponds to a dictionary above, setting different learning rates in the dictionary. When both parts have the same other parameters, put those parameters outside the list as global parameters, like the above weight_decay
.
You can also set a global learning rate outside the list; when the local learning rates are set in the dictionaries of each part, the local learning rates will be used; otherwise, the global learning rate outside the list will be used.
Exciting! CVer academic exchange group has been established!
Scan to add CVer assistant to apply to join CVer-Object Detection, Image Segmentation, Object Tracking, Face Detection & Recognition, OCR, Pose Estimation, Super Resolution, SLAM, Medical Imaging, Re-ID, GAN, NAS, Depth Estimation, Autonomous Driving, Reinforcement Learning, Lane Line Detection and Model Pruning & Compressionand other groups. Be sure to note:Research Direction + Location + School/Company + Nickname(e.g., Object Detection + Shanghai + Shanghai Jiaotong University + Kaka)
▲ Long press to join the group
▲ Long press to follow us
Please give me a thumbs up!!