PyTorch Debugging Tool: Automatically Print Tensor Info

Machine Heart Release

Author: zasdfgbnm

This article introduces a practical tool for PyTorch code called TorchSnooper. The author is the creator of TorchSnooper and also one of the PyTorch developers.

GitHub project address: https://github.com/zasdfgbnm/TorchSnooper

Many of you may encounter such troubles: for example, when running your own PyTorch code, PyTorch prompts you that the data types do not match, requiring a double tensor but you provided a float; or it requires a CUDA tensor, but you provided a CPU tensor. For example, like the following:

RuntimeError: Expected object of scalar type Double but got scalar type Float

Debugging such problems can be very troublesome because you don’t know where the problem started. For example, you might have created a CPU tensor using torch.zeros on the third line of your code, and then this tensor underwent several operations, all performed on the CPU without any errors, until the tenth line when it needed to operate with a CUDA tensor passed in as input, at which point the error occurred. To debug such errors, sometimes you have to manually write print statements line by line, which is very cumbersome.

Or, you might imagine what kind of operation you would perform on a tensor and expect a certain result, but PyTorch reports an error saying that the tensor shapes do not match, or it doesn’t report any errors but the final output shape is not what we expected. At this point, we often don’t know where things started to deviate from our expectations. We sometimes also need to insert a bunch of print statements to find the cause.

TorchSnooper is a tool designed to solve this problem. The installation of TorchSnooper is very simple; you just need to execute the standard Python package installation command:

pip install torchsnooper

After installation, you only need to decorate the function you want to debug with @torchsnooper.snoop(). When this function is executed, it will automatically print out the shape, data type, device, and whether gradient information is needed for the tensor of each executed line.

After installation, let’s illustrate how to use it with two examples.

Example 1

For example, we wrote a very simple function:

def myfunc(mask, x):
    y = torch.zeros(6)
    y.masked_scatter_(mask, x)
    return y

This is how we use this function:

mask = torch.tensor([0, 1, 0, 1, 1, 0], device='cuda')
source = torch.tensor([1.0, 2.0, 3.0], device='cuda')
y = myfunc(mask, source)

The code above seems fine, but when we run it, it throws an error:

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'mask'

Where is the problem? Let’s snoop! Decorate the myfunc function with @torchsnooper.snoop():

import torch
import torchsnooper

@torchsnooper.snoop()
def myfunc(mask, x):
    y = torch.zeros(6)
    y.masked_scatter_(mask, x)
    return y

mask = torch.tensor([0, 1, 0, 1, 1, 0], device='cuda')
source = torch.tensor([1.0, 2.0, 3.0], device='cuda')
y = myfunc(mask, source)

Then run our script, we see the following output:

Starting var:.. mask = tensor<(6,), int64, cuda:0>
Starting var:.. x = tensor<(3,), float32, cuda:0>
21:41:42.941668 call         5 def myfunc(mask, x):
21:41:42.941834 line         6     y = torch.zeros(6)
New var:....... y = tensor<(6,), float32, cpu>
21:41:42.943443 line         7     y.masked_scatter_(mask, x)
21:41:42.944404 exception    7     y.masked_scatter_(mask, x)

Combining our error, we mainly look at the device of each output variable to find out from which variable it started on the CPU. We note this line:

New var:....... y = tensor<(6,), float32, cpu>

This line directly tells us that we created a new variable y and assigned a CPU tensor to this variable. This line corresponds to the code y = torch.zeros(6). Therefore, we realize that when using torch.zeros, if we do not manually specify the device, the default created tensor is on the CPU. We change this line to y = torch.zeros(6, device=’cuda’), and this issue is fixed.

Although this issue is fixed, our problem is not completely resolved. Running the modified code still throws an error, but now the error has changed to:

RuntimeError: Expected object of scalar type Byte but got scalar type Long for argument #2 'mask'

Alright, this time the error is due to the data type. This error message is quite informative; we can roughly know that our mask’s data type is wrong. Looking again at the output from TorchSnooper, we notice:

Starting var:.. mask = tensor<(6,), int64, cuda:0>

Indeed, our mask’s type is int64, but it should be uint8. We modify the definition of mask:

mask = torch.tensor([0, 1, 0, 1, 1, 0], device='cuda', dtype=torch.uint8)

And then it can run.

Example 2

This time we want to build a simple linear model:

model = torch.nn.Linear(2, 1)

We want to fit a plane y = x1 + 2 * x2 + 3, so we create such a dataset:

x = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
y = torch.tensor([3.0, 5.0, 4.0, 6.0])

We use the most common SGD optimizer for optimization, and the complete code is as follows:

import torch

model = torch.nn.Linear(2, 1)

x = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
y = torch.tensor([3.0, 5.0, 4.0, 6.0])

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
for _ in range(10):
    optimizer.zero_grad()
    pred = model(x)
    squared_diff = (y - pred) ** 2
    loss = squared_diff.mean()
    print(loss.item())
    loss.backward()
    optimizer.step()

However, during the process of running, we find that the loss does not decrease beyond around 1.5. This is very abnormal because the data we constructed should all fall without error on the plane we want to fit, and the loss should decrease to 0 to be considered normal.

At first glance, it is unclear where the problem lies. With a try-it-and-see attitude, we snoop a bit. In this example, we did not define a custom function, but we can use the with statement to activate TorchSnooper. We place the training loop inside the with statement, and the code becomes:

import torch
import torchsnooper

model = torch.nn.Linear(2, 1)

x = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
y = torch.tensor([3.0, 5.0, 4.0, 6.0])

optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

with torchsnooper.snoop():
    for _ in range(10):
        optimizer.zero_grad()
        pred = model(x)
        squared_diff = (y - pred) ** 2
        loss = squared_diff.mean()
        print(loss.item())
        loss.backward()
        optimizer.step()

Running the program, we see a long list of outputs. By browsing through them carefully, we notice

New var:....... model = Linear(in_features=2, out_features=1, bias=True)
New var:....... x = tensor<(4, 2), float32, cpu>
New var:....... y = tensor<(4,), float32, cpu>
New var:....... optimizer = SGD (Parameter Group 0    dampening: 0    lr: 0....omentum: 0    nesterov: False    weight_decay: 0)
02:38:02.016826 line        12     for _ in range(10):
New var:....... _ = 0
02:38:02.017025 line        13         optimizer.zero_grad()
02:38:02.017156 line        14         pred = model(x)
New var:....... pred = tensor<(4, 1), float32, cpu, grad>
02:38:02.018100 line        15         squared_diff = (y - pred) ** 2
New var:....... squared_diff = tensor<(4, 4), float32, cpu, grad>
02:38:02.018397 line        16         loss = squared_diff.mean()
New var:....... loss = tensor<(), float32, cpu, grad>
02:38:02.018674 line        17         print(loss.item())
02:38:02.018852 line        18         loss.backward()
26.979290008544922
02:38:02.057349 line        19         optimizer.step()

By carefully observing the shapes of the tensors here, we can easily find that y has the shape (4,), while pred has the shape (4, 1). When they are subtracted, due to broadcasting, the shape of squared_diff becomes (4, 4).

This is certainly not the result we want. Fixing this problem is also simple; we just change the definition of pred to pred = model(x).squeeze(). Now, looking at the modified code’s output from TorchSnooper:

New var:....... model = Linear(in_features=2, out_features=1, bias=True)
New var:....... x = tensor<(4, 2), float32, cpu>
New var:....... y = tensor<(4,), float32, cpu>
New var:....... optimizer = SGD (Parameter Group 0    dampening: 0    lr: 0....omentum: 0    nesterov: False    weight_decay: 0)
02:46:23.545042 line        12     for _ in range(10):
New var:....... _ = 0
02:46:23.545285 line        13         optimizer.zero_grad()
02:46:23.545421 line        14         pred = model(x).squeeze()
New var:....... pred = tensor<(4,), float32, cpu, grad>
02:46:23.546362 line        15         squared_diff = (y - pred) ** 2
New var:....... squared_diff = tensor<(4,), float32, cpu, grad>
02:46:23.546645 line        16         loss = squared_diff.mean()
New var:....... loss = tensor<(), float32, cpu, grad>
02:46:23.546939 line        17         print(loss.item())
02:46:23.547133 line        18         loss.backward()
02:46:23.591090 line        19         optimizer.step()

Now the results look normal. After testing, the loss can now decrease to very close to 0. Mission accomplished.

This article is published by Machine Heart, please contact this public account for authorization if reprinted.

✄————————————————

Join Machine Heart (Full-time Reporter / Intern): [email protected]

Submit or seek reports: content@jiqizhixin.com

Advertising & Business Cooperation: [email protected]

Leave a Comment