That is no doubt that artificial intelligence has created real value in the digital industries such as media and advertising, finance and retail with great promises. While deep learning has been widely applied in all walks of life, the hard truth is that using multiple, gigantic datasets to train or infer them still takes a ton of processing power. Processing an image or other data with a deep neural implies billions of operations, huge matrices of many dimensions being multiplied, inversed, reshaped, and fourier transformed.
Clearly, for real-time applications such as facial recognition or the detection of defective products in a production line. It is important that the result is generated as quickly as possible, and without the need for an expensive and power hungry CPU or GPU. So I try to find the best way to solve these problem on edge device.
In this paper, I will introduce Openvino and TensorRT for you, which are the deep learning inferencd engines on CPU or GPU in lower cost edge device. But firstly, you need to train your model by other deep learning platform such as Tensorflow or Pytorch. As for me, I obtain the original model trained by Pytorch in my local host with Nvidia 1080Ti and export it to ONNX format model for converting easily.
Test Environment:
- Host PC:
- CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
- GPU: Nvidia 1080Ti
- Pytorch 1.5.0 with CUDA 10.1 and cudnn7.6.2
- onnx 1.7.0
- Openvino edge device:
- CPU: Intel Celeron J4105 Processor @ 1.50GHz
- GPU: Intel® UHD Graphics 600
- Openvino version 2020.3
- OpenCL 1.2 NEO
- TensorRT edge device:
- Jetson Nano
- CPU: Quad-core ARM® Cortex®-A57 MPCore processor
- GPU: NVIDIA Maxwell™ architecture with 128 NVIDIA CUDA® cores 0.5 TFLOPs (FP16)
- JetPack 4.3 with CUDA 10.2 and TensorRT 7.1.0.16
Pytorch To ONNX:
Pytorch has a easy way to export model to onnx.
1 | import torch |
Note: There are some pytorch operators which are not supported opset_version 9 and 10, and the lastest version is 11.(bilinear need opset_version 11. But for the openvino and tensorrt, the opset_version is not supported perfectly. You can try to export the model to 9, 10 and 11, and convert it all.)
OpenVino:
OpenVINO™ toolkit quickly deploys applications and solutions that emulate human vision. Based on Convolutional Neural Networks (CNNs), the toolkit extends computer vision (CV) workloads across Intel® hardware, maximizing performance. The OpenVINO™ toolkit includes the Deep Learning Deployment Toolkit (DLDT).
OpenVINO™ toolkit:
- Enables CNN-based deep learning inference on the edge
- Supports heterogeneous execution across an Intel® CPU, Intel® Integrated Graphics, Intel® FPGA, Intel® Movidius™ Neural Compute Stick, Intel® Neural Compute Stick 2 and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs
- Speeds time-to-market via an easy-to-use library of computer vision functions and pre-optimized kernels
- Includes optimized calls for computer vision standards, including OpenCV* and OpenCL™
1. Install:
1 | # 1. Download the openvino toolkit tar file from the following link. |
2. Enable Intel GPU device in openvino:
Note: If you wanna run Intel GPU in docker, please run the docker image with --device /dev/dri
to make Intel GPU available in the container. Verify your intel gpu and opencl version by clinfo
which can be installed with apt sudo apt install clinfo
. There are two ways to install opencl for intel gpu, one is beignet, other is intel neo. For some old cpus, you only can install beignet-dev > 1.3 to enable opencl for them.
- NEO:
1 | # 1. Go to the install_dependencies directory: |
- Beignet:
1 | #################### |
3. Convert ONNX model to openvino model:
For me, I try to run onnx model in openvino, if you want to run Tensorflow or other model in openvino, you can see more detail in (https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html)
1 | cd /opt/intel/openvino/deployment_tools/model_optimizer |
4. Inference Demo:
1 | from openvino.inference_engine import IECore, IENetwork |
5. Infer Speed test:
- About 13~14 fps for infer with FP32 dtype model in Openvino edge device.
TensorRT:
NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications.
- TensorRT-based applications perform up to 40x faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and finally deploy to hyperscale data centers, embedded, or automotive product platforms.
- TensorRT is built on CUDA, NVIDIA’s parallel programming model, and enables you to optimize inference for all deep learning frameworks leveraging libraries, development tools and technologies in CUDA-X for artificial intelligence, autonomous machines, high-performance computing, and graphics.
- TensorRT provides INT8 and FP16 optimizations for production deployments of deep learning inference applications such as video streaming, speech recognition, recommendation and natural language processing. Reduced precision inference significantly reduces application latency, which is a requirement for many real-time services, auto and embedded applications.
- You can import trained models from every deep learning framework into TensorRT. After applying optimizations, TensorRT selects platform specific kernels to maximize performance on Tesla GPUs in the data center, Jetson embedded platforms, and NVIDIA DRIVE autonomous driving platforms.
1. Install:
Note: For me, I use Jetson Nano to run TensorRT, which already has JetPack 4.3 with TensorRT 7.1 in OS. And if you convert the onnx model on other device, make sure that the host pc TensorRT version is as same as the infer device. For more detail about installation, please visit (https://github.com/NVIDIA/TensorRT).
2. Convert onnx model to TensorRT model:
Note: There are two ways to convert the onnx model to TensorRT model,
one is the command-line programs trtexec
which can find in your TensorRT install path, other is using C++ or python api to convert it.
- Command-Line Programs, see more detail in (https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec):
1 | ./trtexec --onnx=model.onnx |
- Python API:
Note: the common file will show in the end.
1 | def build_engine(onnx_file_path): |
3. Inference Demo:
Note: common file see the following, and pycuda install by pip.
1 | import tensorrt as trt |
- common.py file:
1 | from itertools import chain |
4. Infer Speed test:
- Jetson Nano: {‘Inference FPS’: {‘FP32’: ~11.52, ‘FP16’: ~14.25}}