Deploying PyTorch Models on C++ Platforms: A Step-by-Step Guide

Click the aboveBeginner Learning Vision” to choose to add “star” or “pin”

Valuable content delivered promptlyDeploying PyTorch Models on C++ Platforms: A Step-by-Step Guide

From | Zhihu Author | Mars Girl

Link | https://zhuanlan.zhihu.com/p/146453159

Recently, due to work needs, I had to deploy a PyTorch model to a C++ platform. The basic process mainly refers to the official teaching examples, during which I discovered many pitfalls, which I hereby record.

1. Model Conversion

Libtorch does not depend on Python. Models trained in Python need to be converted to a script model to be loaded by libtorch for inference. For this step, the official website provides two methods:

Method 1: Tracing

This method is relatively simple; you only need to provide a set of inputs to the model, run through the inference network, and then use <span>torch.jit.trace</span> to record the information along the path and save it. An example is shown below:

import torch
import torchvision

# An instance of your model.
model = torchvision.models.resnet18()

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)
The downside is that if the model contains control flow like if-else statements, one set of inputs can only traverse one branch, making it impossible to fully record the model information in such cases.

Method 2: Scripting

Write the model directly in Torch script and annotate the model accordingly, then compile the module using <span>torch.jit.script</span> to convert it into a <span>ScriptModule</span>. An example is shown below:

class MyModule(torch.nn.Module):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.weight = torch.nn.Parameter(torch.rand(N, M))

    def forward(self, input):
        if input.sum() > 0:
          output = self.weight.mv(input)
        else:
          output = self.weight + input
        return output

my_module = MyModule(10,20)
sm = torch.jit.script(my_module)

The forward method will be compiled by default, and methods called within forward will also be compiled in the order they are called.

If you want to compile a method that is not called by forward, you can add @torch.jit.export.

If you want a method not to be compiled, you can use

@torch.jit.ignore (https://pytorch.org/docs/master/generated/torch.jit.ignore.html#torch.jit.ignore)

or @torch.jit.unused (https://pytorch.org/docs/master/generated/torch.jit.unused.html#torch.jit.unused)

# Same behavior as pre-PyTorch 1.2
@torch.jit.script
def some_fn():
    return 2

# Marks a function as ignored, if nothing
# ever calls it then this has no effect
@torch.jit.ignore
def some_fn2():
    return 2

# As with ignore, if nothing calls it then it has no effect.
# If it is called in script it is replaced with an exception.
@torch.jit.unused
def some_fn3():
  import pdb; pdb.set_trace()
  return 4

# Doesn't do anything, this function is already
# the main entry point
@torch.jit.export
def some_fn4():
    return 2

In this step, I encountered many pitfalls, mainly due to the following two points:

1. Unsupported Operations

The operations supported by TorchScript are a subset of Python. Most operations used in torch can find corresponding implementations, but there are also some awkward unsupported operations. A detailed list can be found athttps://pytorch.org/docs/master/jit_unsupported.html#jit-unsupported, below are some operations I encountered:

1) Parameters/return values do not support variable numbers, for example:

def __init__(self, **kwargs):

or

if output_flag == 0:
    return reshape_logits
else:
    loss = self.loss(reshape_logits, term_mask, labels_id)
    return reshape_logits, loss

2) Various iteration operations

eg1.

layers = [int(a) for a in layers]

Results in torch.jit.frontend.UnsupportedNodeError: ListComp aren’t supported

Can be changed to:

for k in range(len(layers)):
    layers[k] = int(layers[k])

eg2.

seq_iter = enumerate(scores)
try:
    _, inivalues = seq_iter.__next__()
except:
    _, inivalues = seq_iter.next()

eg3.

line = next(infile)

3) Unsupported statements

eg1. continue is not supported

torch.jit.frontend.UnsupportedNodeError: continue statements aren’t supported

eg2. try-catch is not supported

torch.jit.frontend.UnsupportedNodeError: try blocks aren’t supported

eg3. with statements are not supported

4) Other common ops/modules

eg1. torch.autograd.Variable

Solution: Use torch.ones/torch.randn etc. to initialize + .float()/.long() to specify data types.

eg2. torch.Tensor/torch.LongTensor etc.

Solution: Same as above

eg3. requires_grad parameter is only supported in torch.tensor, not available in torch.ones/torch.zeros etc.

eg4. tensor.numpy()

eg5. tensor.bool()

Solution: Use tensor > 0 instead of tensor.bool()

eg6. self.seg_emb(seg_fea_ids).to(embeds.device)

Solution: Explicitly call .cuda() where GPU transfer is needed.

In summary: Try to avoid using libraries other than native Python and PyTorch, such as NumPy, and use PyTorch’s various APIs as much as possible.

2. Specifying Data Types

1) Attributes, most member data types can be inferred from values, but empty lists/dictionaries need to be specified in advance

from typing import Dict

class MyModule(torch.nn.Module):
    my_dict: Dict[str, int]

    def __init__(self):
        super(MyModule, self).__init__()
        # This type cannot be inferred and must be specified
        self.my_dict = {}

        # The attribute type here is inferred to be `int`
        self.my_int = 20

    def forward(self):
        pass

m = torch.jit.script(MyModule())

2) Constants, use the Final keyword

try:
    from typing_extensions import Final
except:
    # If you don't have `typing_extensions` installed, you can use a
    # polyfill from `torch.jit`.
    from torch.jit import Final

class MyModule(torch.nn.Module):

    my_constant: Final[int]

    def __init__(self):
        super(MyModule, self).__init__()
        self.my_constant = 2

    def forward(self):
        pass

m = torch.jit.script(MyModule())

3) Variables. By default, they are tensor types and immutable, so non-tensor types must be specified

def forward(self, batch_size:int, seq_len:int, use_cuda:bool):

Method 3: Mixing Tracing and Scripting

One way is to call script within a trace model, suitable for situations where only a small part of the model needs control flow. An example is shown below:

import torch

@torch.jit.script
def foo(x, y):
    if x.max() > y.max():
        r = x
    else:
        r = y
    return r


def bar(x, y, z):
    return foo(x, y) + z

traced_bar = torch.jit.trace(bar, (torch.rand(3), torch.rand(3), torch.rand(3)))

Another situation is to use tracing to generate submodules within a script module. For layers that have unsupported Python features in script modules, you can encapsulate the related layers and use trace to record the flow of those layers without modifying the other layers. An example is shown below:

import torch
import torchvision

class MyScriptModule(torch.nn.Module):
    def __init__(self):
        super(MyScriptModule, self).__init__()
        self.means = torch.nn.Parameter(torch.tensor([103.939, 116.779, 123.68])
                                        .resize_(1, 3, 1, 1))
        self.resnet = torch.jit.trace(torchvision.models.resnet18(),
                                      torch.rand(1, 3, 224, 224))

    def forward(self, input):
        return self.resnet(input - self.means)

my_script_module = torch.jit.script(MyScriptModule())

2. Saving Serialized Models

If you have successfully navigated the pitfalls in the previous step, saving the model becomes very simple; you just need to call save and pass a filename. It is important to note that if you want to train the model on GPU and do inference on CPU, you must convert it before saving the model, and remember to call model.eval(), as follows:

gpu_model.eval()
cpu_model = gpu_model.cpu()
sample_input_cpu = sample_input_gpu.cpu()
traced_cpu = torch.jit.trace(traced_cpu, sample_input_cpu)
torch.jit.save(traced_cpu, "cpu.pth")

traced_gpu = torch.jit.trace(traced_gpu, sample_input_gpu)
torch.jit.save(traced_gpu, "gpu.pth")

3. Loading the Trained Model in C++

To load the serialized PyTorch model in C++, you must rely on the PyTorch C++ API (also known as LibTorch). Installing libtorch is very simple; you just need to download the corresponding version from the PyTorch official website (https://pytorch.org/) and unzip it. You will get a folder structured as follows.
libtorch/
  bin/
  include/
  lib/
  share/

Then you can build the application, a simple example directory structure is as follows:

example-app/
  CMakeLists.txt
  example-app.cpp

The example code for example-app.cpp and CMakeLists.txt is as follows:

#include <torch/script.h> // One-stop header.
#include <iostream>
#include <memory>
int main(int argc, const char* argv[]) {
  if (argc != 2) {
    std::cerr << "usage: example-app <path-to-exported-script-module>\n";
    return -1;
  }

  torch::jit::script::Module module;
  try {
    // Deserialize the ScriptModule from a file using torch::jit::load().
    module = torch::jit::load(argv[1]);
  }
  catch (const c10::Error& e) {
    std::cerr << "error loading the model\n";
    return -1;
  }

  std::cout << "ok\n";
}
cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(custom_ops)

find_package(Torch REQUIRED)

add_executable(example-app example-app.cpp)
target_link_libraries(example-app "${TORCH_LIBRARIES}")
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)

Now you can run the following commands to build the application from the <span>example-app/</span> folder:

mkdir build
cd build
cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
cmake --build . --config Release

Where /path/to/libtorch is the path to the libtorch folder you downloaded earlier. If this step is successful, you should see a 100% completion message during compilation, and the next step is to run the generated executable file, where you should see the output “ok”. Congratulations!

4. Executing the Script Module

Finally, we have reached the last step! Now, you just need to pass the input to the model and execute forward to get the output. A simple example is shown below:

// Create a vector of inputs.
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1, 3, 224, 224}));

// Execute the model and turn its output into a tensor.
at::Tensor output = module.forward(inputs).toTensor();
std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';

The first two lines create a vector of <span>torch::jit::IValue</span> inputs and add a single input. Using <span>torch::ones()</span> creates the input tensor, equivalent to <span>torch.ones</span> in the C++ API. Then, run the <span>script::Module</span>‘s <span>forward</span> method, converting the returned IValue to a tensor using <span>toTensor()</span>. C++ operations for torch are quite friendly, as you can find corresponding implementations using <span>torch::</span> or by adding _ at the end, for example:

torch::tensor(input_list[j]).to(at::kLong).resize_({batch, 128}).clone()
//torch::tensor corresponds to PyTorch's torch.tensor; at::kLong corresponds to torch.int64; resize_ corresponds to resize

Lastly, check to ensure the output on the C++ side matches that of PyTorch, and you will have successfully completed the process!

I encountered countless pitfalls and lost many hairs, and a lot of this was learned through personal exploration. If there are any mistakes, please feel free to correct me!

References:

PyTorch C++ API – PyTorch master document

Torch Script – PyTorch master documentation

Article link:

https://pytorch.org/cppdocs/

https://pytorch.org/tutorials/advanced/cpp_export.html

Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial

Reply "Extension Module Chinese Tutorial" in the "Beginner Learning Vision" public account background to download the first Chinese version of the OpenCV extension module tutorial online, covering more than twenty chapters of content such as extension module installation, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, etc.

Download 2: 52 Lectures on Python Vision Practical Projects
Reply "Python Vision Practical Projects" in the "Beginner Learning Vision" public account background to download 31 visual practical projects, including image segmentation, mask detection, lane line detection, vehicle counting, adding eyeliner, license plate recognition, character recognition, emotion detection, text content extraction, facial recognition, etc., to help quickly learn computer vision.

Download 3: 20 Lectures on OpenCV Practical Projects
Reply "20 Lectures on OpenCV Practical Projects" in the "Beginner Learning Vision" public account background to download 20 practical projects based on OpenCV, achieving advanced learning in OpenCV.

Group Chat

You are welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (which will gradually be subdivided in the future). Please scan the WeChat number below to join the group, and note: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Visual SLAM". Please follow the format for notes; otherwise, you will not be approved. After successfully adding, you will be invited to join the relevant WeChat group based on your research direction. Please do not send advertisements in the group, or you will be removed from the group. Thank you for your understanding~

Leave a Comment