Baido AI Series Research: Open Capabilities

One original article per week, focusing on 5G, the Internet of Things, and Artificial Intelligence. Follow my 【Top Viewpoints】 to consistently utilize fragmented time for learning.

Today, we will explore Baidu’s “Face and Body Recognition” open capabilities. With the widespread use of cameras, applications in the field of face and body recognition are becoming increasingly common, which is also an important aspect of the implementation of artificial intelligence.

Baidu’s “Face and Body Recognition” capabilities include:

Face Recognition
Portrait Effects
Body Analysis
Behavior Analysis

Face Recognition

Classification

Face Detection: Returns face bounding boxes and key points, recognizing various facial attributes.
Face Comparison: Evaluates the similarity between two face images.
Face Search:Conducts 1:N searches in a specified face library using a given image.
Liveness Detection:Prevents cheating attacks during face recognition processes using photos, molds, etc.

1. Face DetectionQuickly detects faces and returns the bounding box position, outputting the coordinates of 150 key points of the face, accurately identifying various attribute information.

Face Detection Positioning: Detects faces in images and marks face coordinates, supporting simultaneous recognition of multiple faces.
Face Attribute Analysis: Accurately identifies various face attribute information, including age, gender, attractiveness, expression, emotion, face shape, head pose, whether eyes are closed, whether glasses are worn, face quality information, and type.
150 Key Points Recognition:Precisely locates 150 key points, including cheeks, eyebrows, eyes, mouth, nose, and facial contours.
Emotion Recognition:Analyzes the detected face’s emotion and returns a confidence score. Currently, it can recognize nine emotions: anger, disgust, fear, happiness, sadness, surprise, pouting, grimacing, and neutral.
Image Quality Preprocessing:Analyzes characteristics such as occlusion, blurriness, illumination intensity, pose angle, completeness, and size of the face in the image, ensuring the image meets quality standards to guarantee the accuracy of subsequent face comparison and search.
Online Image Liveness Detection:Based on flaws in the portrait within a single image (moiré patterns, imaging distortions, etc.), it determines whether the image is a secondary shot, filtering out faces that do not meet standards during detection.

Note: What are Moiré Patterns?Moiré patterns are high-frequency interference stripes that appear on the photosensitive element of digital cameras or scanners, causing irregular high-frequency colored stripes in images.

Application Scenarios:

Intelligent Member Management:Based on face detection and tracking functions, cameras capture the faces of customers entering the store in real-time, recognizing attributes such as age, gender, and attractiveness, automatically classifying customer portraits, and providing more precise customer segmentation traffic analysis combined with customer consumption records; at the same time, combined with product promotion information, it provides a more vivid interactive marketing experience based on different customer attributes, enhancing customer satisfaction and promoting shopping conversion.
Smart Campus Management:Applying face recognition technology to camera monitoring, conducting real-time detection and positioning of students, faculty, and strangers, addressing the needs of campus security monitoring, attendance, student self-service, classroom focus analysis, etc., creating an intelligent campus management system, enhancing campus life experience and safety.
Face Beauty Effects:Based on 150 key point recognition, automatically and accurately locates facial features and contours, allowing customization for beautification of specific facial areas; simultaneously obtains expression, emotion, and other facial attribute information, enabling interactive entertainment functions like special effects cameras and dynamic stickers.
Interactive Entertainment Marketing:Based on face detection and attribute analysis, accurately identifies the 150 key points of faces in images, enabling various online interactive entertainment marketing modes, such as face affinity tests, celebrity face swaps, and attractiveness comparisons, enhancing user experience and fun, aiding in the market promotion of entertainment products.

2. Face Comparison

Compares two faces in a 1:1 manner, obtaining the similarity of the faces.

Face Similarity Comparison:Compares the similarity between two faces in the images and returns a similarity score.
Supports Four Types of Images:Supports face comparison for four types of images: candid photos, ID photos, ID chip photos, and images with mesh patterns.
Image Quality Control:Analyzes characteristics such as occlusion, blurriness, illumination intensity, pose angle, completeness, and size of the face in the images, returning an accurate similarity score based on output images that meet quality standards.
Online Liveness Detection:Analyzes flaws in the portrait within a single image (moiré patterns, imaging distortions, etc.) to determine whether the target object is a real person, ensuring the comparability effect is real and reliable.

Application Scenarios: Remote Account Opening in Finance: In the remote identity verification stage, obtain the user’s ID card photo and a live photo taken on-site, conduct a 1:1 comparison to assess the authenticity of the user’s information. Quickly open accounts anytime and anywhere, optimizing the identity verification process in high-risk industries like finance, achieving “in-process control and post-process tracking” for systematic management, completing the account opening process at low cost and low risk.
Service Personnel Identity Supervision:In service sectors like delivery and housekeeping, service personnel complete a face-to-face 1:1 verification before starting their tasks to ensure identity authenticity; simultaneously, it enables comprehensive service control to avoid misconduct by service personnel, ensuring high-quality service levels.
Smart Face Attendance:Using face recognition to replace card swiping and fingerprint recognition for attendance, enabling simultaneous attendance for multiple people, with the backend system recording in real-time to ensure accurate identification of attendees’ identities, completing 1:1 face comparison verification in under one second, effectively preventing proxy clocking and card theft, enhancing enterprise information management of employees.
Hotel Self-Service Check-In:When users check in, the live photo taken on-site is compared with the ID chip photo extracted from the ID card or the photo of the ID card, conducting a 1:1 comparison to assess the authenticity of the user’s information and complete identity verification. Transforming hotel property operations into “unmanned” intelligent management, allowing guests to check in automatically using face recognition, reducing manual review costs while ensuring guest identity security.

3. Face Search

Given a photo, compares it against N faces in a face library for 1:N retrieval, identifying the most similar one or more faces and returning similarity scores. Supports management of a million-level face library with millisecond-level recognition response, suitable for applications like identity verification, face attendance, and face-based access.

1:N Search:Searches for a given face image in the face library, returning the most similar one or more faces and their corresponding similarity scores.
M:N Search:If a single image contains M faces, it supports searching all faces in the face library at once, returning each face’s corresponding user and similarity score.
1:N Comparison:Supports comparing a single face image with multiple photos of a specified user in the face library, returning the similarity score with that user.

4. Liveness Detection

Provides six types of online/offline liveness detection services, identifying whether users in business scenarios are “real people,” effectively resisting cheating behaviors using photos, videos, 3D molds, etc., ensuring business security.

Online Image Liveness Detection:Based on flaws in the portrait within a single image (moiré patterns, imaging distortions, etc.), determines whether the target is alive, effectively preventing secondary shooting from screens and other cheating attacks, supporting single or multiple judgment logic.
Online H5 Video Liveness Detection:Records a video on-site, reads out randomly assigned numbers, ensuring the video’s immediacy rather than pre-recording, and uploads the video to the cloud for liveness analysis, enhancing resistance against attacks.
Action Coordination Liveness Detection:Completes seven preset actions such as blinking, opening mouth, shaking head, turning head left and right, and nodding up and down in coordination with SDK, randomly capturing multiple images for liveness judgment, allowing customization of effective actions and verification order.
Offline Infrared Detection:Utilizes near-infrared imaging principles to achieve liveness judgment under nighttime or no natural light conditions, maintaining high robustness under circumstances where screens cannot image or material reflectivity differs.
Using 3D Structured Light Principles:Based on 3D structured light imaging principles, constructs depth images by reflecting light from the face, determining whether the target is alive, effectively defending against attacks using images, videos, screens, and molds.

Portrait Effects

Face Fusion: Combines the facial features of two faces to generate a new face image.
Portrait Segmentation:Identifies the outline of the body in the image and separates it from the background.
Anime Characterization:Customizes anime images tailored to individual users.

Body Analysis

1. Body Key Point Recognition: Detects the body in the image and returns the rectangular bounding box position, accurately locating 21 core key points, including the top of the head, facial features, neck, and major limb joints, supporting multi-person detection and complex scenarios with large movements.

Multi-Person Liveness Detection: Detects all bodies in the image, marking each body’s coordinate position; supports an unlimited number of bodies, adapting to situations with slight occlusion or truncation.
Key Point Localization:Accurately locates 21 major key points of the body, including the top of the head, facial features, neck, and major limb joints; supports complex scenarios such as back, side, low-angle shots, and large movements.

2. Crowd Flow Statistics:Counts the number of bodies in the image and the flow trend, primarily recognizing heads and shoulders for counting, without needing frontal or full-body photos, adapting to crowded scenarios and various entrances and exits.

Static Crowd Statistics:Applicable for mid-to-long distance overhead shots above 3 meters, counting the instantaneous number of people in the image based on heads; no upper limit on the number of people, widely applicable in crowded places such as airports, stations, shopping malls, exhibitions, and tourist attractions.
Dynamic Crowd Statistics:For entrances and exits such as stores and passageways, uses heads and shoulders as recognition targets for body detection and tracking, determining the direction of entry and exit based on target trajectories, achieving dynamic counting of people entering and exiting areas.

3. Body Detection and Attributes:Detects all bodies in the image, returning the position coordinates of each body; recognizes over 20 types of body attribute information, including gender, age, clothing category, clothing color, wearing a hat (distinguishing between safety helmets and ordinary hats), wearing a mask, carrying a backpack, smoking, using a phone, etc.

4. Hand Key Point Detection:Detects the hands in the image and returns the rectangular bounding box position, locating 21 major bone nodes of the hand, applicable for custom gesture detection, AR effects, human-computer interaction, and other scenarios.

Behavior Analysis

Driving Behavior Analysis: Identifies violations such as smoking and using a phone by the driver.
Gesture Recognition:Recognizes 24 common gestures, supporting selfies and others’ photos.
Dangerous Behavior Recognition:Identifies common dangerous behaviors in monitoring video segments within 5 seconds.
Finger Tip Recognition: Locates the coordinates of the fingertip for reading questions and answers.

Summary

The term face recognition is too broad. After the detailed analysis above, the accurate concept should be: face comparison refers to 1:1, while face search refers to 1:N or M:N. With the deployment of 5G infrastructure, video will undoubtedly be the mainstream medium of the future, so broadly speaking, face recognition or analysis falls under the category of image processing, and it will become increasingly detailed. After completing the research on Baidu’s foundational technology platform, we will push out a batch of interesting AI applications as requested by our many fans.

Disclaimer:

This public account is for personal research and study sharing, not a commercial public account with any commercial purpose. If the content of the article infringes or contains illegal information, please contact this account immediately for deletion. Thank you. Contact information:[email protected]

Leave a Comment Cancel reply