Baido AI Research Series: Open Capabilities

One original article every Monday, focusing on 5G, IoT, and artificial intelligence. Follow my 【Top Viewpoint】 to consistently utilize fragmented time for learning.

In the previous article, we detailed one of Baidu’s open capabilities—speech technology. Today, we continue to explore Baidu’s open capabilities—image technology.

There are a total of five general capabilities: image review, vehicle analysis, image recognition, image search, and image effect enhancement.

1. Image Review

An intelligent content review solution based on deep learning, accurately identifying sensitive content such as pornography, violence, terrorism, political sensitivity, micro-business advertisements, and disgusting content in images and videos. It can also filter images based on aesthetics and clarity, quickly and accurately, freeing up human review resources.

1. Pornography Recognition

AI technology for identifying pornography, intelligently recognizing sexual and suggestive content in images and videos, allowing your application to easily pass reviews and avoid compliance risks.

2. Political Sensitivity Recognition

Identifying political figures and sensitive political event scenes, helping UGC, IM, and BBS products avoid related risks during sensitive periods.

3. Ad Detection

Intelligently detecting text, watermarks, QR codes, and barcodes in images, detecting various micro-business advertisements to purify your application.

4. Disgusting Image Recognition

Accurately identifying disgusting or uncomfortable images, including corpses, dissections, insects, physiological abnormalities, and images that may trigger trypophobia.

5. Violence and Terrorism Recognition

Identifying violence, bloody scenes, and images and videos related to terrorist organizations, reducing the risk of violence and terrorism in applications.

6. Public Figure Recognition

Supports facial recognition of 160,000 public figures from both domestic and international sources, including singers, actors, athletes, and political figures.

7. Text and Image Review

Text recognition and review for the textual content within images are conducted from multiple dimensions.

8. Image Quality Detection

Identifies the aesthetics and clarity of images, detecting color, composition, and whether there are issues such as blurriness, focus loss, noise, aliasing, or mosaics.

The above capabilities have been packaged into APIs by Baidu, allowing users to upload images and receive results. Supported image formats include PNG, JPG, JPEG, BMP, etc.

Application scenarios: Video content live review—utilizing Baidu’s pornography recognition, violence and terrorism recognition, and political sensitivity recognition technologies to achieve real-time automatic review of images, videos, and live content, with standardized, simple, rapid, and cost-effective solutions. Social and e-commerce content supervision—numerous images containing pornography, violence, and politically sensitive content in social and e-commerce applications expose them to regulatory risks. Integrating Baidu’s image review services can automatically identify related violations in image content, significantly reducing human costs and lowering business compliance risks.

2. Vehicle Analysis

Accurately recognizes vehicle-related information in images, providing capabilities such as model recognition, vehicle detection, traffic flow statistics, vehicle attribute recognition, vehicle appearance damage recognition, and vehicle background segmentation.

Baido AI Research Series: Open Capabilities

1. Model Recognition

Identifies specific vehicle models, primarily passenger cars, outputting the brand, model, year, color, and encyclopedic information of the main vehicle in the image; can recognize 3,000 common models with an accuracy rate of over 90%.

2. Vehicle Detection

Identifies the types and positions of all vehicles in an image, counting five types of vehicles: passenger cars, trucks, buses, motorcycles, and tricycles, while also locating the license plate positions of passenger cars, trucks, and buses.

(1) Low to mid-altitude, shooting below 30 meters:Designed for low to mid-altitude shooting scenarios, detecting all vehicles in the image and returning the type and coordinate position of each vehicle, capable of recognizing five major vehicle types.

(2) High-altitude, shooting above 30 meters:Designed for high-altitude shooting perspectives (above 30 meters), detecting all vehicles in the image, returning the coordinates of each vehicle (without distinguishing vehicle types), and counting the vehicles.

3. Traffic Flow Statistics

Based on video snapshot sequences, vehicle detection and tracking are conducted, identifying the number of vehicles (including passenger cars, trucks, buses, motorcycles, and tricycles) entering and exiting a specified area, achieving dynamic traffic flow statistics.

4. Vehicle Attribute Recognition

Detects various types of vehicles in images and identifies 11 exterior attributes for passenger cars, including whether there are rain visors on windows, whether there is a roof rack, and whether there is a passenger in the front seat, suitable for specific vehicle detection and tracking in traffic security scenarios.

5. Vehicle Appearance Damage Recognition

For common passenger car models, recognizes damaged components and types of damage, capable of identifying dozens of vehicle components and five major types of appearance damage (scratches, dents, cracks, wrinkles, perforations).

Application scenarios: Intelligent damage assessment, intelligent vehicle inspection.

6. Vehicle Segmentation

Detects vehicles in images, primarily passenger cars, recognizes the outline of vehicles, separates them from the background, and returns segmented binary images, grayscale images, and foreground cutouts, adapting to multiple vehicles, open doors, and various angles.

3. Image Recognition

Precisely recognizes over 100,000 types of objects and scenes, providing multiple high-precision recognition capabilities and corresponding API services, fully meeting the business needs of various individual developers and enterprise users.

1. General Object and Scene Recognition

Supports recognition of over 100,000 common objects and scenes, returning names of one or more objects in the image and providing encyclopedic information. Suitable for image or video content analysis, photo recognition, and other business scenarios. Can be used for intelligent recommendation scenarios, recommending based on users’ browsed images.

2. Image Subject Detection

Detects subjects in images, supporting single subject detection and multiple subject detection.

Can identify the location and labels of subjects in the image, facilitating cropping of the corresponding subject area for subsequent image processing, mass image classification, and labeling scenarios.

3. Animal Recognition

Recognizes nearly 8,000 types of animals, returning animal names and providing encyclopedic information, suitable for photo recognition apps.

4. Plant Recognition

Supports recognition of over 20,000 common plants and nearly 8,000 types of flowers, returning plant names and providing encyclopedic information, suitable for photo recognition apps.

Baido AI Research Series: Open Capabilities

5. Brand Logo Recognition

Recognizes over 20,000 types of product logos, supporting users in creating their own brand logo library, accurately identifying brand logo names in images, suitable for business scenarios requiring quick access to brand information.

6. Fruit and Vegetable Recognition

Recognizes nearly 1,000 types of fruits and vegetables, suitable for recognizing images containing only one type of fruit or vegetable, allowing customization of the number of recognition results returned, suitable for food-related apps.

7. Dish Recognition

Recognizes over 9,000 types of dishes, allowing customers to create their own dish library, accurately identifying dish names and locations in images, providing encyclopedic information, suitable for various customer scenarios for dish recognition.

8. Wine Recognition

Recognizes wine labels in images, returning wine names, countries, regions, wineries, types, sugar content, grape varieties, and wine descriptions, capable of recognizing hundreds of thousands of domestic and foreign wines.

9. Currency Recognition

Recognizes currency types in images, returning currency names, codes, denominations, and year information, capable of recognizing over a hundred commonly used domestic and foreign currencies.

10. Landmark Recognition

Supports recognition of approximately 120,000 famous landmarks and attractions, widely used in photo recognition, image classification, and other scenarios.

11. Photo Reproduction Recognition

For fast-moving consumer goods sales scenarios, accurately recognizes fraudulent photos taken of screens, effectively reducing manual review workloads and minimizing financial losses for brands due to image fraud.

Can be used with EasyDL retail version product detection API to ensure the authenticity of product recognition results such as shelf numbers and distribution rates.

12. Fast-Moving Consumer Goods Detection

API for product detection that can be used directly without training, supporting the recognition of common beverages and daily chemical products, returning product names, specifications, categories, and positions in images. The AI model is specifically optimized for product display scenarios, adapting to various complex shelf environments in large supermarkets, convenience stores, and street shops.

13. Storefront Recognition

Recognizes 200,000 types of preset store facades, supporting the creation of a custom facade library, enabling accurate recognition of facade names and positions in images.

14. EasyDL Classic Version

High-precision AI models customized without algorithm knowledge.

(1) Zero algorithm training model:Requires no specialized machine learning knowledge; simply upload and label example data to train the model with one click.

(2) Model Effect Verification:View detailed effect evaluation reports and validate model performance in a visual interface, allowing targeted data supplementation for training.

(3) Model Application Deployment:Once satisfied with model performance, deploy the model on the cloud, device, private server, or purchase an integrated hardware and software solution.

Visual Operation:

Requires no specialized machine learning knowledge, with a fully visual and convenient process for model creation, data upload, model training, and model release, enabling a high-precision model in as little as 15 minutes.

High-Precision Results:

EasyDL combines Baidu’s AutoDL/AutoML technology at the core, automatically obtaining the optimal network and hyperparameter combinations based on user data, achieving excellent model performance with a small amount of data.

Flexible Deployment:

Trained models can be deployed via public cloud APIs, device SDKs, or private servers, and integrated hardware and software solutions are also provided to adapt flexibly to various usage scenarios and operational environments.

Data Support:

Comprehensive support for high-quality data collection and efficient labeling for training data, supporting continuous data expansion during model iteration to enhance model performance.

4. Image Search

Search for identical or similar images within a specified library through image-based searches, suitable for precise image retrieval, similar material searches, photographing to find similar products, and recommending similar products.

1. Identical Image Search:Suitable for precise image retrieval and duplicate image filtering.

2. Similar Image Search:Search for semantically similar image collections, supporting billions of images in libraries.

3. Product Image Search:Search for products by image, photographing to find similar items, and recommending similar products.

4. Illustrated Book Image Search:Photograph to search for children’s books, accurately retrieving corresponding illustrated textbooks from a self-built library.

5. Image Effect Enhancement

1. Image Dehazing:Processes images taken in foggy weather to remove haze.

2. Lossless Image Enlargement:Enlarges images by two times in both width and height, maintaining quality without loss.

3. Image Stretch Recovery:Identifies excessively stretched image content and restores it to normal proportions.

4. Image Restoration:Removes unwanted obstructions from images and repairs missing content.

5. Portrait Animation:Customizes unique anime images for users.

6. Image Color Enhancement:Intelligently adjusts the saturation, brightness, and contrast of images.

7. Image Contrast Enhancement:Adjusts the contrast of images that are too dark or too bright.

8. Black and White Image Coloring:Intelligently recognizes the content of black and white images and fills in colors.

9. Image Style Transfer:Transforms images into cartoon or sketch styles.

10. Image Clarity Enhancement:Intelligently removes noise and enhances image texture details, outputting clearer images.

11. Sky Segmentation:Identifies the outline of the sky in images, separating it from the background.

Summary:

This concludes the basic introduction to Baidu’s image technology. The reason Baidu can achieve such precision in image processing is due to the data from Baidu Search. Image processing capabilities are a type of general model capability characterized by high data requirements, high accuracy requirements, and high computing power requirements, which is why only large companies can develop such general models and open-source them for service to small and medium-sized enterprises or individual developers. The earliest breakthrough scenarios were in industries such as manufacturing and transportation. Baidu’s current competition for orders with small and medium-sized enterprises has resulted in poor ecological development, but if it relies on small and medium-sized enterprises in various vertical fields to expand scenarios, there will be great potential.

The next article will detail Baidu’s AI capabilities—text recognition. Welcome to communicate.

Disclaimer:

This public account is for personal research and study sharing, not a commercial account with any commercial purpose. If the content of the article infringes on rights or contains illegal information, please contact this account immediately for deletion. Thank you. Contact information:[email protected]

Leave a Comment