Hello everyone, I am Xiao G.
In recent years, classic image recognition technologies such as face, vehicle, human attributes, ID cards, and traffic signs have begun to play an extremely important role in our current digital work and life.
Of course, there are also top companies in the industry providing callable APIs and SDKs, but these often face many pain points such as poor generalization effects in customized scenarios, high costs, low controllability of black-boxes, and difficult technical barriers.
Today, I would like to recommend an open-source project on GitHub that covers 9 classic recognition scenarios including people, vehicles, and OCR, achieving rapid recognition in just 3 milliseconds on CPU and allowing iterative training with just one line of code!
Figure 1: PaddleClas Image Recognition and Classification Application Diagram
Without further ado, here is the link, and interested friends can give it a try.
GitHub Address:https://github.com/PaddlePaddle/PaddleClas
Star it to avoid getting lost.
Next, let’s break down the outstanding aspects of this project!
Figure 2: 9 Major Scene Model Effect Diagram
Highlight 1: Perfect Balance of Accuracy and Speed
From the famous ResNet50 to the currently popular Swin-Transformer, model accuracy is constantly being refreshed, but prediction efficiency is not high. Even the smallest model of Swin-Transformer has a prediction speed exceeding 100ms on CPU, which far fails to meet the industry’s real-time prediction needs.
On the other hand, using lightweight models like MobileNet series can guarantee high prediction efficiency, predicting an image in about 3ms on CPU, but the model accuracy often has a significant gap compared to larger models.
The PaddleClas ultra-light image classification solution (Practical Ultra Light Classification, abbreviated as PULC) perfectly solves the pain points of balancing algorithm accuracy and speed in industrial applications.
Table 1: Comparison of Accuracy and Speed Results of Different Models
As shown in the figure, its accuracy rivals that of large models like Swin-Transformer, while the prediction speed is over 30 times faster, with an inference time of only 2ms on CPU!
Highlight 2: Extremely Easy to Use
The PULC solution not only perfectly balances accuracy and speed but also fully considers the need for rapid iteration of algorithms in industrial practice, allowing model training with just one command.
Meanwhile, the PaddleClas team has released 9 major scene models including people, vehicles, and OCR, allowing business POC effect verification in just 2 steps, achieving a complete process of training, inference, and deployment, truly realizing “out-of-the-box”!
Moreover, the project is matched with detailed Chinese documentation and industrial practice example tutorials.
Figure 3: Usage Documentation and Example Diagram
Highlight 3: Integration of Numerous Core Technologies
The ultra-light image classification solution (PULC) integrates four leading optimization strategies in the industry:
Figure 4: Ultra-light Image Classification Solution (PULC) Diagram
PP-LCNet, as a backbone network model tailored for CPU, far exceeds algorithms of similar size such as MobileNetV3 in both speed and accuracy. After optimization of multiple scene models, the speed is over 30 times faster than SwinTransformer’s models, and the accuracy is 18 points higher than MobileNetV3_small_0.35x.
SSLDPre-trained Weights
The SSLD semi-supervised distillation algorithm allows small models to learn features from large models and knowledge from unlabeled large-scale data of ImageNet22k. When training small models, using SSLD pre-trained weights as the initialization parameters can improve the accuracy of application classification models in different scenarios by 1-2.5 points.
Data Augmentation StrategyIntegration
This solution integrates three data augmentation methods: image transformation, image cropping, and image blending, and supports custom adjustment of trigger probabilities, greatly enhancing the model’s generalization ability and improving performance in real-world scenarios. The model can further improve accuracy by approximately 1 point based on the previous step.
SKL (symmetric-KL) introduces symmetric information based on the classic KL knowledge distillation algorithm, enhancing the robustness of the algorithm. Meanwhile, this solution allows for the easy addition of unlabeled training data (Unlabeled General Image) during training, which can further improve model performance. This algorithm can continue to improve model accuracy by 1-2 points.
Group Benefits:
1. Get the live course link detailing the upgrade content of PaddleClas.
2. Obtain a 10G Heavyweight image classification learning gift package compiled by the PaddleClas team, including:
Figure 5: PaddleClas Group Gift Package Content Diagram
How to Join:
STEP 1: Scan the QR code with WeChat and fill out the questionnaire
STEP 2: Join the exchange group to receive benefits
Moreover, the PaddleClas team, considering various software and hardware environments and different scene requirements faced in real industrial applications, provides not only the PULC solution but also 20 industrial algorithm implementation solutions including 3 training methods, 5 training environments, 3 model compression strategies, and 9 inference deployment methods:
Table 3: Image Classification Industrial Implementation Toolkit Training Inference Deployment Function Support List
Key points of high concern include:
The PaddlePaddle distributed training architecture features 4D mixed parallelism, end-to-end adaptive distributed training, and other unique technologies. In PP-LCNet training, 4 machines with 8 cards achieve an acceleration ratio of 3.48 times compared to a single machine with 8 cards, with an acceleration efficiency of 87% and no loss in accuracy.
The PaddlePaddle model compression tool PaddleSlim is fully functional, covering model pruning, quantization, distillation, and NAS. After quantization and pruning, the average prediction time for image classification models on mobile devices is reduced by 24%.
The PaddlePaddle lightweight inference engine Paddle Lite is compatible with over 20 AI acceleration chips, enabling quick deployment of image classification models on efficient devices such as mobile devices, embedded devices, and IoT devices.
All models and codes are open-sourced in PaddleClas, along with detailed documentation and example projects. Hurry to check all the open-source code and Star it now!
GitHub Address:https://github.com/PaddlePaddle/PaddleClas
To help developers gain a deeper understanding of the new content released by PaddleClas, solve the difficulties of implementation, and master core capabilities of industrial practice, the PaddlePaddle team has prepared a three-day live course from June 15 to June 17 at 20:30!
Senior engineers from Baidu will detail the ultra-light image classification solution, dissecting the optimization principles and usage methods of various scene models, followed by hands-on teaching of industrial case full processes, addressing various pain points. Plus, there will be live interaction and Q&A! What are you waiting for? Scan the code to join!
Official website: https://www.paddlepaddle.org.cn
PaddleClas project address:
GitHub: https://github.com/PaddlePaddle/PaddleClas
Gitee: https://gitee.com/paddlepaddle/PaddleClas