Machine Learning Process: Features, Models, Optimization, and Evaluation

Source: CloudB Computational Thinking and Beauty

This article is about 2200 words long and is recommended for a 7-minute read.
How can "humans" do what they excel at and leave the rest to machines.

[ Introduction ]Machine learning has been leading the development of artificial intelligence since the 1980s. Its significant contribution to AI is the shift from human-given intelligence to machines learning intelligence on their own. Undoubtedly, the ability to learn and solve problems is the concentrated embodiment of intelligence. How can machines simulate this ability of humans? Practice has proven that algorithms based on large-scale parallel architectures at the brain level are more practical than those based on logical rules. How can “humans” do what they excel at and leave the rest to machines? From strong algorithms to strong computing power, and then to strong data, machines continuously extend and expand the boundaries of human capabilities.

Features

In machine learning, a feature refers to an independent quantifiable attribute used to describe data objects. A single feature is insufficient to represent an object, so a combination of features is used in machine learning — a feature vector.

For example, to predict house prices, features include: house area, number of rooms, geographical location, year built, proximity to schools and subway stations, etc.;
To identify objects in an image, features include: raw pixel values, edge detection values, color histograms, deep learning-extracted quantities, etc.;
To determine the sentiment polarity (positive/negative) of a piece of text, features include: word frequency, word embedding generated vectors, positive/negative words, sentence length, etc.;
To predict future weather temperatures, features include: historical temperature values, humidity, atmospheric pressure, wind speed, season, time, etc.;

Deep learning can replace traditional feature engineering by automatically learning complex features from data, reducing the need for manual intervention. Specifically, deep learning models (especially deep neural networks) can automatically extract hierarchical features from raw data without relying on manually designed features.

End-to-end learning: the process from raw data to final output can be fully learned by the neural network. Although deep learning has become mainstream in machine learning, traditional feature engineering methods still hold value in scenarios with insufficient data or high interpretability requirements.

Limitations:

High data demands Deep learning typically requires a large amount of labeled data for effective training. If the data volume is insufficient, it may not be able to learn effective features fully.
High training costs Training deep learning models often requires powerful computational resources and a long time.
Poor interpretability
The “black box” nature of deep learning models makes them harder to explain than traditional methods, especially in scenarios where model decision explanations are needed, they are less transparent than traditional machine learning methods.

Models and Evaluation

Model: A mathematical representation abstracted from data; a good model depends not only on algorithms and data but also on task requirements.
Strategy: Selection and comparison of different models;
Algorithm: Specific implementation methods, such as how to optimize solutions for mathematical problems?

The process of analyzing and testing the performance of a trained model to determine its performance on new data. Therefore, data is usually divided into training set, validation set, and test set.

(1) Training Set, Validation Set, and Test Set

Training Set (Training Set) trains the parameters of the machine learning model, estimating the model;
Validation Set (Validation Set) adjusts parameters during training; Development Set (Dev Set) selects different parameters, controlling the complexity of the model, and is a more flexible data evaluation set.
Test Set (Test Set) verifies the performance of the final machine learning system and how the performance of the optimal model is;
Bias: Bias measures the difference between the model’s predicted results and the true values, i.e., the model’s insufficient fitting degree to the training data.
Variance: Variance measures the model’s sensitivity to noise in the training data, i.e., the degree of overfitting of the model to the training data.

The bias-variance dilemma: when a model is under-trained, its fitting ability is weak, bias dominates, and as training deepens, the model’s fitting degree increases, variance gradually dominates.

(2) Precision vs. Recall

Let’s see how the metrics change if we catch all the carp, shrimp, and turtles in the pool:

Accuracy = 1400 / (1400 + 300 + 300) = 70%
Recall = 1400 / 1400 = 100%
F1 Score = 70% * 100% * 2 / (70% + 100%) = 82.35%

It can be seen that accuracy is the proportion of the target outcomes among the captured results;

Recall is the proportion of the target category recalled from the area of interest;
While the F1 score is a comprehensive evaluation metric that reflects the overall performance of both metrics.

Optimization

Most optimization problems in machine learning can be reduced to minimization problems, i.e., finding parameters that minimize the loss function.

1. Parameter Optimization Problems

Linear Regression: Minimizing the Mean Squared Error (MSE) loss function to find the optimal regression coefficients.

Logistic Regression: Minimizing the cross-entropy loss function to find the optimal classification parameters.

2. Regularization Optimization Problems

To prevent overfitting, we usually add regularization terms to the objective function. For example:

L2 Regularization: (also called Ridge Regression) includes the sum of squares of the parameters.
L1 Regularization: (also called Lasso Regression): includes the sum of absolute values of the parameters.

These regularization terms increase the complexity of the optimization problem, aiming to find a solution that fits the data well while being less prone to overfitting.

3. Neural Network Optimization Problems

Training neural networks is also an optimization problem, typically optimized using backpropagation algorithms to adjust the weights and biases of the neural network. During neural network training:

The objective function is the loss function (such as cross-entropy loss, mean squared error loss).
The optimization process adjusts the weights and biases of the network using methods such as gradient descent.

Neural network optimization problems often have multiple local minima or saddle points, making them more complex than traditional linear models.

4. Support Vector Machine (SVM) Optimization Problems

The goal of SVM is to maximize the margin of the classification boundary, separating two classes of data points while minimizing classification error. The optimization problem includes:

The objective function: maximize the margin (i.e., minimize the objective function) while satisfying certain classification accuracy.
SVM optimization problems often involve constraints (for example, soft margin support vector machines).

Optimization algorithms:

Gradient Descent is a common optimization method used to minimize the objective function. The algorithm adjusts parameters based on the gradient (derivative) of the loss function with respect to the model parameters.
Momentum is an improvement of gradient descent that adds “inertia” from past gradients, adjusting the direction of parameter updates to accelerate convergence and avoid oscillation.
Adam (Adaptive Moment Estimation): Adam is a commonly used optimization algorithm that combines the advantages of gradient descent and momentum, able to adaptively adjust the learning rate for each parameter, widely used in deep learning.

Editor: Huang Jiyan

About Us

Data Pie THU, as a public account for data science, is backed by the Tsinghua University Big Data Research Center, sharing cutting-edge research dynamics in data science and big data technology innovation, continuously spreading data science knowledge, and striving to build a platform for gathering data talents, creating the strongest group in China’s big data.

Sina Weibo: @Data Pie THU

WeChat Video Account: Data Pie THU

Today’s Headlines: Data Pie THU

1. Parameter Optimization Problems

2. Regularization Optimization Problems

3. Neural Network Optimization Problems

4. Support Vector Machine (SVM) Optimization Problems

Leave a Comment Cancel reply