Comprehensive Summary of Machine Learning Basics

Machine learning is divided into two main categories based on model types: supervised learning models and unsupervised learning models.

1. Supervised Learning

Supervised learning typically uses training data with expert-labeled tags to learn a function mapping from input variable X to output variable Y. Y = f(X), where training data is usually in the form of (n×x,y), with n representing the size of the training sample, and x and y being the sample values of variables X and Y, respectively.

Supervised learning can be divided into two categories:

Classification problems: Predicting the category of a sample (discrete). For example, determining gender, health status, etc.
Regression problems: Predicting the corresponding real number output of a sample (continuous). For example, predicting the average height of people in a region.

Additionally, ensemble learning is also a type of supervised learning. It combines predictions from multiple relatively weak machine learning models to predict new samples.

1.1 Single Model

1.11 Linear Regression

Linear regression refers to a regression model composed entirely of linear variables. In linear regression analysis, there is only one independent variable and one dependent variable, and their relationship can be approximated by a straight line; this type of regression analysis is called univariate linear regression analysis.

If the regression analysis includes two or more independent variables, and the relationship between the dependent variable and the independent variables is linear, it is called multivariate linear regression analysis.

1.12 Logistic Regression

Used to study the influence relationship between X and Y when Y is categorical data. If Y has two categories, such as 0 and 1 (for example, 1 means willing and 0 means not willing, 1 means purchased and 0 means not purchased), this is called binary logistic regression; if Y has three or more categories, it is called multinomial logistic regression.

The independent variables do not necessarily have to be categorical; they can also be quantitative variables. If X is categorical data, dummy variable settings for X are required.

1.13 Lasso

The Lasso method is a compression estimation method that replaces the least squares method. The basic idea of Lasso is to establish an L1 regularization model, which compresses some coefficients and sets some coefficients to zero during the model building process. Once the model training is complete, the parameters with weights equal to 0 can be discarded, simplifying the model and effectively preventing overfitting. It is widely used for fitting and variable selection in the presence of multicollinearity.

1.14 K-Nearest Neighbors (KNN)

The main difference between KNN for regression and classification lies in the decision-making process during the final prediction. When KNN does classification predictions, it generally uses majority voting, that is, predicting the class of the sample based on the K nearest samples in the training set.

When KNN performs regression, it generally uses the average method, predicting the regression value as the average of the outputs of the K nearest samples. However, their theories are the same.

1.15 Decision Trees

In a decision tree, each internal node represents a splitting question: it specifies a test on an attribute of the instance, dividing the samples that reach that node according to a specific attribute, with each successor branch corresponding to a possible value of that attribute.

Classification trees output the mode of the output variable among the samples in the leaf node as the classification result. The regression tree outputs the mean of the output variable among the samples in the leaf node as the prediction result.

1.16 BP Neural Network

BP neural networks are a type of multilayer feedforward network trained by the error backpropagation algorithm, and one of the most widely used neural network models today. The learning rule of BP neural networks uses the steepest descent method to continuously adjust the network’s weights and thresholds through backpropagation to minimize the classification error rate (the sum of squared errors).

The BP neural network is a multilayer feedforward neural network characterized by forward propagation of signals and backward propagation of errors. Specifically, for the following neural network model containing only one hidden layer,the BP neural network process is mainly divided into two phases:

The first phase is forward signal propagation from the input layer through the hidden layer to the output layer;
The second phase is backward error propagation from the output layer to the hidden layer, finally reaching the input layer, adjusting the weights and biases from the hidden layer to the output layer and from the input layer to the hidden layer in sequence.

1.17 Support Vector Machine (SVM)

Support Vector Machine Regression (SVR) maps data to high-dimensional feature space using nonlinear mapping, ensuring that in the high-dimensional feature space, the independent and dependent variables exhibit good linear regression characteristics. After fitting in that feature space, it returns to the original space.

Support Vector Machine Classification (SVM) is a type of generalized linear classifier that performs binary classification of data using supervised learning, with its decision boundary being the maximum margin hyperplane solved from the training samples.

1.18 Naive Bayes

Given the premise of one event occurring, we calculate the probability of another event occurring — we will use Bayes’ theorem. Assuming prior knowledge is d, to calculate the probability that our hypothesis h is true, we will use the following Bayes’ theorem:

This algorithm assumes that all variables are mutually independent.

1.2 Ensemble Learning

Ensemble learning is a method that combines the results of different learning models (such as classifiers) to further improve accuracy through voting or averaging. Generally, voting is used for classification problems, and averaging is used for regression problems. This approach is based on the idea that “many hands make light work.”

Ensemble algorithms mainly fall into three categories: Bagging, Boosting, and Stacking. This article will not discuss stacking.

Comprehensive Summary of Machine Learning Basics

Boosting

1.21 GBDT

GBDT is a Boosting algorithm based on CART regression trees, which is an additive model that serially trains a set of CART regression trees, ultimately summing the predictions of all regression trees to obtain a strong learner. Each new tree fits the negative gradient direction of the current loss function. Finally, the sum of this set of regression trees outputs the regression result or applies the sigmoid or softmax function to obtain binary or multi-class results.

1.22 AdaBoost

AdaBoost assigns a high weight to learners with low error rates and a low weight to learners with high error rates, combining weak learners with corresponding weights to generate a strong learner. The difference between regression and classification algorithms lies in the way error rates are calculated; classification problems generally use the 0/1 loss function, while regression problems generally use squared loss functions or linear loss functions.

1.23 XGBoost

XGBoost stands for

1. Supervised Learning

1.1 Single Model

1.2 Ensemble Learning

Leave a Comment Cancel reply