Introduction to Computer Vision: Facial Beauty Scoring

Click “Computer Vision Life” to follow and star for faster access to valuable content!

This article is authorized by Yousan AI

Today we present a technology for scoring facial beauty in facial recognition. What standards are used to evaluate beauty? Since it is a “value”, can we “measure” it?

Overview

In recent years, with the development of facial recognition technology, beauty scoring has received widespread attention and research. Even when people score, preferences vary widely; mature women and young girls each have their favorites. How can computers judge beauty? In fact, scientists have researched the “beauty” of faces and have been developing corresponding “beauty algorithms”. The idea of an “average face” is to use algorithms to detect feature points, then segment the facial image into different areas, perform piecewise affine transformations and weighted averages, considering both the shape and texture features of the face. The synthesized image is shown below:

Introduction to Computer Vision: Facial Beauty Scoring

Faces that are symmetrical and have a pleasing skin tone are more likely to be favored by the public. This can be considered a consensus in beauty scoring, hence the feasibility of beauty algorithms. Major companies in China have developed beauty scoring applications; let’s take a look.

Measurement of the same image in different applications

Measurement of different faces in the same application (example: Baidu AI)

Scoring tests with different ages and skin tones in the beauty scoring system:

Beauty measurement is an entertainment application. We randomly selected a few images from the dataset for testing, and the results are as follows:

1. Different skin tones scored without significant differences.

2. Scores mostly hover around 60.

Overall, the performance is quite similar; it is mainly for entertainment.

Facial Beauty Dataset and Scoring Criteria

2.1 Dataset

The address is https://github.com/HCIILAB/SCUT-FBP5500-Database-Release.

Introduction to Computer Vision: Facial Beauty Scoring

The dataset consists of 5500 frontal faces, with an age distribution of 15-60 years, all with natural expressions. It includes different gender and racial distributions (2000 Asian females, 2000 Asian males, 750 Caucasian males, 750 Caucasian females), with data sourced from Data Hall, US Adult database, etc. Each image was rated by 60 individuals, categorized into 5 levels. These 60 individuals are aged 18-27, all being young. It is suitable for model research based on appearance/shape, etc. Additionally, each image provides annotations for 86 key points.

The beauty distribution across various demographics is shown below:

Beauty scores were fitted using a mixture of Gaussian models containing two principal components. The red and green curves represent the distributions of low and high beauty scores, respectively. For these four demographic groups, the average high beauty score is about 4, while the low beauty score is around 2.5.

Additionally, document [2] lists some datasets for further exploration.

2.2 Evaluation Criteria

2.2.1 Pearson Correlation Coefficient

This measures whether there is a linear relationship between data and represents the strength of the linear relationship between variables. It measures the linear correlation between two datasets by calculating the distance between them. Let N be the number of facial images with human ratings as {x1, x2, …, xi, …, xN}, and the computed scores as {y1, y2, …, yi, …, yN}, where xi represents the true value of the i-th image and yi represents the predicted beauty score for the i-th image. The correlation coefficient r is calculated as follows:

The higher the r value, the closer the human classification results are to the predictions of this method, indicating better performance; conversely, lower values indicate poorer performance, as shown in the figure below.

2.2.2 Maximum Absolute Error and Root Mean Square Error

Maximum absolute error is the absolute difference between true and predicted values, while root mean square error is the square root of the sum of squared errors; the formulas are straightforward and won’t be listed here.

Traditional Method Research Approach

The traditional method focuses on manually extracting features, as illustrated based on reference [3].

It is divided into geometric features and appearance features. Geometric features include the positions of key facial feature points, distances between key positions, and area ratios of various facial organs; appearance features include LBP texture features. These two types of features are concatenated to obtain fused features.

Below are the specific steps.

3.1 Image Preprocessing

Facial images collected have varying quality and noise, with significant differences in brightness and gray levels. Preprocessing the images aids in subsequent feature extraction and calculations. Image preprocessing includes grayscale processing, position detection, and tilt correction.

1. Convert the image to grayscale.

2. Use the Haar classifier to capture the approximate area of the face.

3. Calculate the tilt angle and correct it.

3.2 Geometric Feature Extraction

3.2.1 Global Features

According to the ASM algorithm, connect the horizontal and vertical coordinates of 68 facial feature points to form a feature vector representing the geometric features of the face. However, due to the angle of the original image, errors may occur during the computation, so normalization of the vector is required, which includes (1) translation invariance (2) scale invariance (3) rotation invariance.

3.2.2 Facial Distance Features

After a certain age, the positional information of facial organs does not change with age (except in cases of surgery or accidents). Based on the ASM-68 vector, 18 distance features are defined as follows:

3.2.3 Area Features

Using the key points located by ASM, find the triangles representing the area of facial organs such as the eyes, nose, chin, and mouth. The areas of the resulting 54 triangles can be normalized to obtain area features.

3.2 Appearance Feature Extraction

Appearance features represent the overall appearance of the face and the condition of the skin. They reflect information such as texture, skin condition, and color depth. The LBP feature is a mature choice for appearance features.

LBP features involve comparing the pixel values of neighboring pixels against the center pixel value, marking those greater than the threshold as 1 and those less than or equal to it as 0, generating an 8-bit binary number, which serves as the LBP value for the center pixel. Gabor features are also frequently used.

3.3 Feature Fusion and Classification

3.3.1 Facial Feature Fusion

The geometric features extracted earlier show information about key facial feature points, distance ratios between facial organs, and area features. Appearance features represent the global texture characteristics of the face. Since all features have been normalized, they can be directly fused using concatenation.

3.3.2 Classification

After feature fusion, the data can be input into a classifier, commonly SVM.

Document [2] employed 18-dimensional distance features and Gabor filter features, comparing linear regression, Gaussian regression, and support vector regression methods. The results are as follows:

The results indicate that for geometric features, Gaussian regression and support vector regression perform better, both surpassing linear regression methods. For texture features, variations in key point extraction methods lead to different strengths and weaknesses between Gaussian regression and support vector regression.

Deep Learning Method Research Approach

In deep learning methods, manual feature extraction is no longer necessary, leaving only the selection of optimization targets and networks. More powerful networks typically yield better performance; let’s examine the training results from document [2].

Experimental results confirm our predictions: the most powerful network, ResNext-50, achieved optimal performance, and all networks outperformed the best traditional methods. There isn’t much to discuss regarding deep learning methods; feeding data is the right answer.

Conclusion

In conclusion, the facial beauty scoring algorithm, although a somewhat subjective issue, can still yield relatively unified results. It can be treated as a classification problem or a regression problem. The combination of deep learning methods with larger, higher-quality datasets can effectively address this issue. Currently, it is used for entertainment in various software, but beauty algorithms also hold certain value in the beauty industry and await further application.

References:

[1] Global Network. British scientists draw the “average face” of women from 41 countries. 2013-9-23

[2] Liang L, Lin L, Jin L, et al. SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction[J]. arXiv preprint arXiv:1801.06345, 2018.

[3] Jiang Ting, Shen Xudong, Lu Wei, et al. Facial Beauty Prediction Based on Multi-Feature Fusion[J]. New Media Technology, 2017, 6(2): 7-13.

Introduction to Computer Vision: Facial Beauty Scoring

Recommended Reading

Leave a Comment Cancel reply