Click the "Xiaobai Learns Vision" above, select to add "Star" or "Top"
Heavyweight content delivered promptly
Below is the running application:
Google has released a new TensorFlow Object Detection API. The first version includes:
-
Pre-trained models (especially focusing on lightweight models so they can run on mobile devices)
-
Jupyter notebook examples with a published model
-
Very handy scripts for re-training models on your own datasets.
We hope to fully understand this new thing and spend some time building a simple real-time object detection demo.
First, we downloaded the TensorFlow model library and then reviewed their published annotations. It basically walks through all the steps of using the pre-trained models. In their example, they used the “SSD with Mobilenet” model, but we could also download other pre-trained models they call “TensorFlow Detection Models”. These models vary in speed (slow, moderate, and fast) and performance based on training on the COCO dataset.
Next, what we need to do is run the example. The example is actually well-documented. Essentially, this is what it does:
-
Import the necessary packages, such as TensorFlow, PIL, etc.
-
Define some variables, such as the number of classes, model name, etc.
-
Download the model (.pb – protobuf) and load it into memory.
-
Load some code, such as the index of the label converter.
-
Test the code with two images.
Note: Before running the example, be sure to check the setup instructions. The protobuf compilation part is especially important:
# From tensorflow/models/research/protoc object_detection/protos/*.proto --python_out=.
Then, we took their code and made corresponding modifications:
-
Removed the model download part
-
The TensorFlow session does not have a “with” statement, as this is a huge overhead, especially when a session needs to be started after each stream.
-
PIL is not needed, as the video stream in OpenCV is already in numpy arrays (PIL is also a huge overhead, especially when used to read images).
Then, we connected it with the webcam using OpenCV. There are many examples explaining how to do this, even in the official documentation.
Generally, many implementations of OpenCV examples are not really optimal, for example, some functions in OpenCV are limited by I/O. Therefore, we had to come up with various solutions to address this issue:
Reading frames from the network camera leads to a lot of I/O. Our idea was to use the multiprocessing library to move this part completely to another Python process. There are some explanations on Stackoverflow about why it doesn’t work, but I didn’t delve deeper into this. A good example on Adrian Rosebrock’s site “pyimagesearch” uses threads instead, which greatly improves our fps.
Loading the frozen model into memory every time the application starts is a significant overhead. And we have used a TF session for each run, but it is still very slow. In this case, we used the multiprocessing library to offload the heavy lifting of the object detection part into multiple processes. The initial startup of the application will be slow because each process needs to load the model into memory and start the TF session, but after that, the program’s parallelism will greatly improve efficiency.
Note: If you are using OpenCV 3.1 on Mac OSX, the VideoCapture may crash after a while. If there are issues, switching back to OpenCV 3.0 can solve this problem.
Discussion Group
Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions (will gradually subdivide in the future), please scan the WeChat number below to join the group, and note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format for the note, otherwise it will not be approved. After successful addition, you will be invited into relevant WeChat groups based on your research direction. Please do not send ads in the group, otherwise you will be removed from the group. Thank you for your understanding~