Comprehensive Summary of Machine Learning Concepts (Supervised + Unsupervised)

2025-07-18 by AI Agent

Machine learning is divided into two main categories based on model type: supervised learning models and unsupervised learning models:

1. Supervised Learning

Supervised learning typically utilizes training data with expert-labeled tags to learn a function mapping from input variable X to output variable Y. Y = f(X), where training data is usually in the form of (n×x,y), with n representing the size of the training sample, and x and y being the sample values of variables X and Y, respectively.

Supervised learning can be divided into two categories:

Classification problems: predicting the category to which a sample belongs (discrete). For example, determining gender, whether healthy, etc.

Regression problems: predicting the corresponding real number output (continuous) for a sample. For example, predicting the average height of people in a certain area.

Additionally, ensemble learning is also a type of supervised learning. It combines predictions from multiple different relatively weak machine learning models to predict new samples.

1.1 Single Model

1.11 Linear Regression

Linear regression refers to a regression model composed entirely of linear variables. In linear regression analysis, it includes only one independent variable and one dependent variable, and their relationship can be approximately represented by a straight line; this type of regression analysis is called univariate linear regression analysis. If the regression analysis includes two or more independent variables, and there is a linear relationship between the dependent and independent variables, it is called multivariate linear regression analysis.

1.12 Logistic Regression

Used to study the influence relationship between Y as categorical data and X and Y, if Y has two categories such as 0 and 1 (for example, 1 for willing and 0 for not willing, 1 for purchase and 0 for no purchase), this is called binary logistic regression; if Y has three or more categories, it is called multiclass logistic regression.

The independent variables do not necessarily have to be categorical; they can also be quantitative variables. If X is categorical data, dummy variable settings are required for X.

1.13 Lasso

The Lasso method is a compressive estimation method that serves as an alternative to the least squares method. The basic idea of Lasso is to establish an L1 regularization model, which compresses some coefficients and sets some coefficients to zero during the model establishment process. After model training is complete, parameters with weights equal to 0 can be discarded, making the model simpler and effectively preventing overfitting. It is widely used for fitting and variable selection in the presence of multicollinearity data.

1.14 K-Nearest Neighbors (KNN)

The main difference between KNN for regression and classification lies in the decision-making method during the final prediction. When KNN performs classification prediction, it generally uses a majority voting method, i.e., the K nearest samples to the predicted sample in the training set are predicted to belong to the category with the most occurrences among them. For regression, KNN generally uses averaging, taking the average output of the K nearest samples as the regression prediction value. However, their theories are the same.

1.15 Decision Tree

In a decision tree, each internal node represents a splitting problem: it specifies a test on a certain attribute of instances, which splits the samples reaching that node according to a specific attribute, and each successor branch of that node corresponds to a possible value of that attribute. The leaf nodes of classification trees contain samples, and the mode of the output variable is the classification result. The leaf nodes of regression trees contain samples, and the mean of the output variable is the prediction result.

1.16 BP Neural Network

The BP neural network is a multilayer feedforward network trained by the error backpropagation algorithm, and it is one of the most widely used neural network models. The learning rule of the BP neural network uses the steepest descent method to continuously adjust the weights and thresholds of the network through backpropagation, minimizing the classification error rate (minimizing the sum of squared errors).

The BP neural network is a multilayer feedforward neural network characterized by: signals propagate forward while errors propagate backward. Specifically, for the following neural network model containing only one hidden layer:

The process of the BP neural network mainly consists of two stages: the first stage is the forward propagation of signals from the input layer through the hidden layer to the output layer; the second stage is the backward propagation of errors from the output layer to the hidden layer and finally to the input layer, sequentially adjusting the weights and biases from the hidden layer to the output layer and from the input layer to the hidden layer.

1.17 Support Vector Machine (SVM)

Support Vector Regression (SVR) maps data to a high-dimensional feature space using non-linear mapping, allowing for good linear regression characteristics between independent and dependent variables in the high-dimensional feature space, fitting in that feature space before returning to the original space.

Support Vector Machine classification (SVM) is a type of generalized linear classifier that performs binary classification on data using supervised learning, with its decision boundary being the maximum margin hyperplane solved from the learning samples.

1.18 Naive Bayes

Given the premise of an event occurring, we calculate the probability of another event occurring—this will use Bayes’ theorem. Assuming prior knowledge as d, to calculate the probability of our hypothesis h being true, we will use the following Bayes’ theorem:

This algorithm assumes that all variables are mutually independent.

1.2 Ensemble Learning

Ensemble learning is a method that combines the results of different learning models (such as classifiers) to further improve accuracy through voting or averaging. Generally, voting is used for classification problems; averaging is used for regression problems. This approach originates from the idea that “many hands make light work.”

Ensemble algorithms mainly consist of three categories: Bagging, Boosting, and Stacking. This article will not discuss stacking.

Boosting

1.21 GBDT

GBDT is a Boosting algorithm with CART regression trees as the base learner. It is an additive model that serially trains a set of CART regression trees, ultimately summing the predictions of all regression trees to obtain a strong learner. Each new tree fits the negative gradient direction of the current loss function. The final output is the sum of this set of regression trees, directly yielding regression results or applying the sigmoid or softmax function to obtain binary or multiclass results.

1.22 AdaBoost

AdaBoost assigns a high weight to learners with low error rates and a low weight to learners with high error rates, combining weak learners with their corresponding weights to generate a strong learner. The difference between regression and classification algorithms lies in the calculation method of the error rate; classification problems generally use a 0/1 loss function, while regression problems typically use a squared loss function or a linear loss function.

1.23 XGBoost

XGBoost stands for

Leave a Comment Cancel reply