TensorFlow Lite Empowers Product Implementation

By / Development Technology Promotion Engineer Khanh LeViet

TensorFlow Lite (tensorflow.google.cn/lite) is the official framework for running TensorFlow model inference on edge devices. TensorFlow Lite is deployed on over 4 billion edge devices worldwide and supports IoT devices and microcontrollers based on Android, iOS, and Linux.

Since the initial release of TensorFlow Lite at the end of 2017, we have been exploring and improving it, ensuring reliability while allowing more developers, whether experts in machine learning or beginners, to participate easily.

In this article, we will focus on the recently released products that help your device use cases transition smoothly from prototype design to production deployment.

For related videos, please watch the presentations from TensorFlow DevSummit 2020.

Chinese subtitles:TFLite From Product Prototype Design to Implementation

Prototype: Starting with Cutting-Edge Models

The field of machine learning is rapidly evolving, so researching the latest technologies to see what improvements can be made is an essential step before investing resources in developing features. We provide a code library of pre-trained models and corresponding sample applications. With the sample applications, you can try out TensorFlow Lite models on your device without writing any code. You can quickly integrate models into your own applications. The time taken for your application to transition from design prototype to testing user experience is comparable to that required for training models in the past.

Modelshttps://tensorflow.google.cn/lite/models

We have released several new pre-trained models, such as the question-answering model and the style transfer model.

Chinese subtitles:TensorFlow Lite Sample Application Library Introduction

Question Answeringhttps://tensorflow.google.cn/lite/models/bert_qa/overview
Style Transferhttps://tensorflow.google.cn/lite/models/style_transfer/overview

Additionally, we are committed to bringing cutting-edge models from research teams into TensorFlow Lite. Recently, we have supported three new model architectures: EfficientNet-Lite (paper), MobileBERT (paper), and ALBERT-Lite (paper):

EfficientNet-Lite is a new type of image classification model that achieves the best accuracy using fewer parameters and less computing power. The model is optimized for TensorFlow Lite, supporting quantization with negligible accuracy loss, and is fully supported by GPU Delegate for faster inference. For more details, please refer to our article.

Benchmarking on a four-thread Pixel 4 CPU, March 2020

MobileBERT is an optimized version of the mainstream BERT (paper) model, achieving the best accuracy for a range of NLP tasks (such as question answering, natural language reasoning, etc.). Compared to BERT, MobileBERT is approximately 4 times faster, smaller in size, yet maintains the same high level of accuracy.
ALBERT is another lightweight version of BERT, optimized for model size while maintaining the same accuracy. ALBERT-Lite is the ALBERT version compatible with TensorFlow Lite, only 1/6 the size of BERT (1.5x smaller than MobileBERT), yet with latency comparable to BERT.

Benchmarking on a four-thread Pixel 4 CPU, March 2020 model hyperparameters: sequence length 128, vocabulary size 30K

Developing Models: Build Models Without ML Background

While introducing cutting-edge research models into TensorFlow Lite, we also want you to easily customize these models for your use cases and needs.

We are excited to announce TensorFlow Lite Model Maker, a user-friendly tool. Through transfer learning, you can apply cutting-edge machine learning models to your dataset. This tool encapsulates complex machine learning concepts in an intuitive API, allowing you to embark on your machine learning journey without any machine learning expertise. You can train state-of-the-art image classification with just 4 lines of code:

data = ImageClassifierDataLoader.from_folder('flower_photos/')
model = image_classifier.create(data)
loss, accuracy = model.evaluate()
model.export('flower_classifier.tflite', 'flower_label.txt', with_metadata=True)

Model Maker

https://github.com/tensorflow/examples/tree/master/tensorflow_examples/lite/model_maker

Model Maker supports many cutting-edge models available on TensorFlow Hub, including the EfficientNet-Lite model. If you want higher accuracy, you can switch to different model architectures by modifying just one line of code (keeping the rest of the training pipeline intact).

# EfficinetNet-Lite2.
model = image_classifier.create(data, efficientnet_lite2_spec)

# ResNet 50.
model = image_classifier.create(data, resnet_50_spec)

Model Maker currently supports two use cases: image classification (tutorial) and text classification (tutorial), with more computer vision and NLP use cases to be added soon.

Image Classification Tutorialhttps://colab.research.google.com/github/tensorflow/examples/blob/master/tensorflow_examples/lite/model_maker/demo/image_classification.ipynb
Text Classification Tutorialhttps://colab.research.google.com/github/tensorflow/examples/blob/master/tensorflow_examples/lite/model_maker/demo/image_classification.ipynb

Developing Models: Seamless Replacement of Models with Associated Metadata

TensorFlow Lite file format metadata always includes input/output tensor shapes. If the model creator is also the application developer, this file format will be very suitable. However, as the ecosystem of machine learning on devices evolves, a more common scenario is that these tasks are completed by different teams within an organization or even in collaboration between different organizations. To accommodate this situation and facilitate better information transfer, we have added new fields to the metadata. They can be divided into two main categories:

Machine-readable parameters: such as normalization parameters, like mean and standard deviation, and class label files. Other systems can read these parameters to generate wrapper code. We provide a relevant example in the next section.
Human-readable parameters: such as model description and model license. These parameters provide key information to application developers using the model on how to use it correctly: such as some advantages or disadvantages they should be aware of. Additionally, fields like licensing are crucial in determining whether a model can be used. Adding such fields to models can reduce barriers and significantly increase adoption.

To strengthen this work, both models created by TensorFlow Lite Model Maker and TensorFlow Lite models related to images on TensorFlow Hub (tensorflow.google.cn/hub) have been appended with metadata. If you are creating your own models, you can achieve easier sharing by appending metadata.

# Creates model info.
model_meta = _metadata_fb.ModelMetadataT()
model_meta.name = "MobileNetV1 image classifier"
model_meta.description = ("Identify the most prominent object in the "
                          "image from a set of 1,001 categories such as "
                          "trees, animals, food, vehicles, person etc.")
model_meta.version = "v1"
model_meta.author = "TensorFlow"
model_meta.license = ("Apache License. Version 2.0 "
                      "http://www.apache.org/licenses/LICENSE-2.0.")
# Describe input and output tensors
# ...

# Writing the metadata to your model
b = flatbuffers.Builder(0)
b.Finish(
    model_meta.Pack(b),
    _metadata.MetadataPopulator.METADATA_FILE_IDENTIFIER)
metadata_buf = b.Output()
populator = _metadata.MetadataPopulator.with_model_file(model_file)
populator.load_metadata_buffer(metadata_buf)
populator.load_associated_files(["your_path_to_label_file"])
populator.populate()

For a complete example of how to populate metadata for MobileNet v1, please refer to this guide (https://tensorflow.google.cn/lite/convert/metadata).

Developing Applications: Automatically Generate Code from Models

Through the machine-readable part of the metadata, the code generator can generate wrapper code for integration, rather than copying and pasting error-prone boilerplate code to convert typed objects (like bitmaps) into ByteArray before passing them to the TensorFlow Lite interpreter.

You can use our first code generator built for Android to generate model wrapper classes. We are also working to integrate this tool into Android Studio.

First code generator built for Androidhttps://tensorflow.google.cn/lite/guide/codegen
Integrationhttps://developer.android.google.cn/studio/preview/features#tensor-flow-lite-models

Developing Applications: Understand Performance Through Benchmarking and Analysis Tools

After creating a model, we want to understand its performance on mobile devices. TensorFlow Lite provides a benchmark tool to measure model performance. We have added support for running benchmarks for all runtime options, including running models on GPU or other supported hardware accelerators, specifying the number of threads, etc. You can also break down inference latency into individual operation granularity to identify the most time-consuming operations and optimize model inference.

Benchmark Toolhttps://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark

Once the model is integrated into the application, you may encounter other performance issues, in which case you can use the performance analysis tools provided by the platform. For example, on Android, you can troubleshoot performance issues using various profiling tools. We have released a TensorFlow Lite performance tracing module on Android to help you understand the internal workings of TensorFlow Lite. By default, this module is installed in our Nightly version. Users can trace to see if resource contention issues occur during inference. For detailed information on how to use the module in the Android benchmark tool, please refer to the documentation.

Profiling Toolshttps://developer.android.google.cn/topic/performance/tracing
Refer to Documentationhttps://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark/android#to-trace-tensorflow-lite-internals-including-operator-invocation

We will continue to improve TensorFlow Lite performance tools, providing more intuitive and practical tools for measuring and adjusting TensorFlow Lite performance on various devices.

Deployment: Easily Scale Across Multiple Platforms

Nowadays, most applications need to support multiple platforms simultaneously. This is also the reason we built TensorFlow Lite: to achieve seamless operation across Android, iOS, Raspberry Pi, and other Linux-based IoT devices. All models of TensorFlow Lite are readily available on officially supported platforms, allowing you to focus on creating high-quality models without worrying about how to adjust for different platforms.

Each platform has specific hardware accelerators for speeding up model inference. TensorFlow Lite has supported running models on NNAPI (for Android) and GPU (for iOS and Android). We are pleased to add more hardware accelerators:

On Android, we support Qualcomm Hexagon DSP, which can be used on millions of devices. This allows developers to leverage DSP on older devices running versions lower than Android 8.1 that do not support Android NN API.
On iOS, we released CoreML Delegate to support running TensorFlow Lite models on Apple’s Neural Engine.
CoreML Delegatehttps://blog.tensorflow.org/2020/04/tensorflow-lite-core-ml-delegate-faster-inference-iphones-ipads.html

Additionally, we have also improved the performance of existing supported platforms, as shown in the following figure, where we compare performance from May 2019 to February 2020. By simply upgrading to the latest TensorFlow Lite library, you can enjoy the benefits of these improvements.

Pixel 4 – Single-thread CPU, February 2020

Outlook

In the coming months, we will focus on supporting more use cases. Meanwhile, we will continue to improve the developer experience:

Continuously releasing the latest cutting-edge edge models, including better support for BERT series models for NLP tasks and new vision models.
Publishing new tutorials and showcasing more application examples, including how to perform inference on mobile devices using C/C++ APIs.
Enhancing Model Maker to support more tasks, including object detection and multiple NLP tasks. We will add BERT support for NLP tasks (such as question answering), allowing developers without machine learning expertise to build state-of-the-art NLP models through transfer learning.
Expanding metadata and Codegen tools to support more use cases, including object detection and more NLP tasks.
Releasing more platform integrations to create a smoother end-to-end experience, including better Android Studio and TensorFlow Hub integration.

Feedback

We are committed to continuing to improve TensorFlow Lite, and we look forward to seeing the models you build using TensorFlow Lite, as well as receiving your feedback. You can submit feedback directly or leave a message on our WeChat backend to share your use cases with us. To report bugs and issues, please contact us on GitHub (https://github.com/tensorflow/tensorflow/issues).

Acknowledgments

Thanks to Amy Jang, Andrew Selle, Arno Eigenwillig, Arun Venkatesan, Cédric Deltheil, Chao Mei, Christiaan Prins, Denny Zhou, Denis Brulé, Elizabeth Kemp, Hoi Lam, Jared Duke, Jordan Grimstad, Juho Ha, Jungshik Jang, Justin Hong, Hongkun Yu, Karim Nosseir, Khanh LeViet, Lawrence Chan, Lei Yu, Lu Wang, Luiz Gustavo Martins, Maxime Brénon, Mia Roh, Mike Liang, Mingxing Tan, Renjie Liu, Sachin Joglekar, Sarah Sirajuddin, Sebastian Goodman, Shiyu Hu, Shuangfeng Li, Sijia Ma, Tei Jeong, Tian Lin, Tim Davis, Vojtech Bardiovsky, Wei Wei, Wouter van Oortmerssen, Xiaodan Song, Xunkai Zhang, YoungSeok Yoon, Yuqi Li, Yi Zhou, Zhenzhong Lan, Zhiqing Sun, and others.

If you want to learn more about the content mentioned in this article, please refer to the following documents. These documents delve into many topics mentioned in this article:

EfficientNet-Lite (paper)https://tfhub.dev/s?deployment-format=lite&q=efficientnet%20lite https://arxiv.org/abs/1905.11946
MobileBERT (paper)https://tfhub.dev/tensorflow/mobilebert/1 https://arxiv.org/abs/2004.02984
ALBERT-Lite (paper)https://tfhub.dev/s?deployment-format=lite&q=albert https://arxiv.org/abs/1909.11942
BERT (paper)https://github.com/google-research/bert https://arxiv.org/abs/1810.04805

— Recommended Reading — TensorFlow Lite Empowers Product Implementation

TensorFlow Lite Empowers Product Implementation

For more information, please click “Read Original” to visit the official website.

Leave a Comment Cancel reply