How to Build a Real-Time Object Detection Program Using TensorFlow and OpenCV

Click the "Xiaobai Learns Vision" above, select to add "Star" or "Top"
Heavyweight content delivered promptly

Introduction

In this article, we will gradually introduce how to easily build your own real-time object detection application using TensorFlow (TF) new Object Detection API and OpenCV in Python 3.

Below is the running application:

Purpose and Motivation

Google has released a new TensorFlow Object Detection API. The first version includes:

Pre-trained models (especially focusing on lightweight models so they can run on mobile devices)
Jupyter notebook examples with a published model
Very handy scripts for re-training models on your own datasets.

We hope to fully understand this new thing and spend some time building a simple real-time object detection demo.

Object Detection Demo

First, we downloaded the TensorFlow model library and then reviewed their published annotations. It basically walks through all the steps of using the pre-trained models. In their example, they used the “SSD with Mobilenet” model, but we could also download other pre-trained models they call “TensorFlow Detection Models”. These models vary in speed (slow, moderate, and fast) and performance based on training on the COCO dataset.

Next, what we need to do is run the example. The example is actually well-documented. Essentially, this is what it does:

Import the necessary packages, such as TensorFlow, PIL, etc.
Define some variables, such as the number of classes, model name, etc.
Download the model (.pb – protobuf) and load it into memory.
Load some code, such as the index of the label converter.
Test the code with two images.

Note: Before running the example, be sure to check the setup instructions. The protobuf compilation part is especially important:

# From tensorflow/models/research/protoc object_detection/protos/*.proto --python_out=.

Then, we took their code and made corresponding modifications:

Removed the model download part
The TensorFlow session does not have a “with” statement, as this is a huge overhead, especially when a session needs to be started after each stream.
PIL is not needed, as the video stream in OpenCV is already in numpy arrays (PIL is also a huge overhead, especially when used to read images).

Then, we connected it with the webcam using OpenCV. There are many examples explaining how to do this, even in the official documentation.

Generally, many implementations of OpenCV examples are not really optimal, for example, some functions in OpenCV are limited by I/O. Therefore, we had to come up with various solutions to address this issue:

Reading frames from the network camera leads to a lot of I/O. Our idea was to use the multiprocessing library to move this part completely to another Python process. There are some explanations on Stackoverflow about why it doesn’t work, but I didn’t delve deeper into this. A good example on Adrian Rosebrock’s site “pyimagesearch” uses threads instead, which greatly improves our fps.

Loading the frozen model into memory every time the application starts is a significant overhead. And we have used a TF session for each run, but it is still very slow. In this case, we used the multiprocessing library to offload the heavy lifting of the object detection part into multiple processes. The initial startup of the application will be slow because each process needs to load the model into memory and start the TF session, but after that, the program’s parallelism will greatly improve efficiency.

Note: If you are using OpenCV 3.1 on Mac OSX, the VideoCapture may crash after a while. If there are issues, switching back to OpenCV 3.0 can solve this problem.

Download 1: OpenCV-Contrib Extension Module Chinese Tutorial

Reply in the “Xiaobai Learns Vision” public account:Extension Module Chinese Tutorial, to download the first OpenCV extension module tutorial in Chinese on the internet, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing and more than twenty chapters.

Download 2: Python Vision Practical Projects 52 Lectures

Reply in the “Xiaobai Learns Vision“ public account:Python Vision Practical Projects, to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, face recognition, to help quickly learn computer vision.

Download 3: OpenCV Practical Projects 20 Lectures

Reply in the “Xiaobai Learns Vision“ public account:OpenCV Practical Projects 20 Lectures, to download 20 practical projects based on OpenCV to advance your learning of OpenCV.

Discussion Group

Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions (will gradually subdivide in the future), please scan the WeChat number below to join the group, and note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format for the note, otherwise it will not be approved. After successful addition, you will be invited into relevant WeChat groups based on your research direction. Please do not send ads in the group, otherwise you will be removed from the group. Thank you for your understanding~

Leave a Comment Cancel reply