Implementing Visual Wake Words Using TensorFlow Lite Micro

By / Aakanksha Chowdhery, Software Engineer

Why don’t you get a response when you say “Yo Google” to Google Assistant?

After all, it’s only one word different from “Ok Google”. This is because Google Assistant recognizes only these two words as ‘wake words’. Wake words are crucial for low-power machine learning design, where models with lower computational costs perform comprehensive processing when “woken up”. Voice wake words (like “Ok Google”) are widely used in waking AI-assisted devices, which then process voice with higher computational cost machine learning models.

The launch of low-power cameras has given rise to a category of applications that combine visual sensors with microcontrollers to classify whether an object of interest (person or other objects) is present in image frames. These applications are called “visual wake words” because they can wake devices when humans appear, similar to the application of voice wake words in voice recognition.

Edge machine learning refers to models that can run on devices without being connected to the cloud. When deploying models for voice sensors or low-power cameras, using low-cost computing platforms such as microcontrollers is a good compromise. However, existing machine learning models hardly meet the constraints of microcontroller devices (in terms of power consumption and processing capability). For more details, refer to this paper (https://arxiv.org/pdf/1906.05721.pdf).

At the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019), Google organized a competition called the “Visual Wake Words Challenge” to solicit miniature visual models for microcontrollers. The challenge involved classifying images into two categories (human/no human) to provide common use cases for microcontrollers. Google opened the visual wake words dataset (derived from the COCO dataset) source code: label 1 corresponds to images containing at least one person (or object of interest); label 0 corresponds to images without “people”.

Note: 2019 Conference on Computer Vision and Pattern Recognition

http://cvpr2019.thecvf.com/

Visual Wake Words Dataset

https://github.com/tensorflow/models/blob/master/research/slim/datasets/build_visualwakewords_data.py

Dataset

http://cocodataset.org/

Implementing Visual Wake Words Using TensorFlow Lite Micro

Using the visual wake word model to detect if a person is present in an image

Machine learning on microcontroller devices requires rethinking model design, rebalancing memory usage, accuracy, and computational costs. The on-chip memory and flash space of ordinary microcontrollers are extremely limited, ranging from 100 to 320 KB, with SRAM from 256 KB to 1 MB. The entire neural network model, its weight parameters, and code must fit within the small flash memory budget. Moreover, the temporary memory buffer required to store input and output activations during computation must not exceed the on-chip memory. In the visual wake words challenge, researchers designed the following model that meets microcontroller device constraints and has the highest accuracy: the model is less than 250 KB, peak memory usage is below 250 KB, and the inference cost per inference is less than 60 million multiply-accumulate operations. For talks from the workshop, please watch IEEEtv.

Note: Workshop

https://rebootingcomputing.ieee.org/lpirc/2019

IEEEtv

http://ieeetv.ieee.org/conference-highlights/visual-wake-words-challenge-aakanksha-chowdhery-lpirc-2018?rf=series%7C3

The on-chip memory and flash space of ordinary microcontrollers are extremely limited. For example, the SparkFun Edge development board comes with 384KB RAM and 1MB flash

The submitted models utilized model pruning and quantization algorithms (included in the TensorFlow Model Optimization Toolkit) and neural architecture search algorithms to design miniature models that meet microcontroller device constraints. Researchers used the microcontroller machine learning framework TensorFlow Lite Micro released by the TensorFlow team to deploy their models on devices.

Note: SparkFun Edge

https://www.sparkfun.com/products/15170

TensorFlow Model Optimization Toolkit

https://tensorflow.google.cn/model_optimization

TensorFlow Lite Micro

https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro/

This challenge received a tremendous response in the research community, with submissions from many companies and universities including ARM, Samsung, Qualcomm, MIT, UC Berkeley, and Oxford University. For the visual wake words dataset, the top entries from MIT and Qualcomm achieved classification accuracies of 94.5% and 95% respectively in the categories of “deployable in existing machine learning frameworks” and “deployable in next-generation machine learning frameworks”.

MIT’s winning team presentation

To download the visual wake words dataset and train your own model, please refer to the tutorial.

Note: Tutorial

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/examples/person_detection/training_a_model.md

Acknowledgments

Thanks to all the people involved in this project:Aakanksha Chowdhery, Daniel Situnayake, Pete Warden; and thanks to the following colleagues for their guidance and advice:Jon Shlens, Andrew Howard, Rocky Rhodes, Nat Jeffres, Bo Chen, Mark Sandler, Meghna Natraj, Andrew Selle, Jared Duke.

Implementing Visual Wake Words Using TensorFlow Lite Micro

Leave a Comment Cancel reply