Understanding the XGBoost Algorithm

XGBoost (eXtreme Gradient Boosting) has become quite popular in various competitions in recent years due to its excellent predictive performance. Below, we will introduce its principles.

Principle

First, we need to understand that XGBoost is an improvement of the GBDT algorithm.

Understanding the XGBoost Algorithm

During the k-th iteration, the loss function of GBDT can be denoted as L(y,F[k](x)). Its second-order Taylor expansion at F[k-1](x) is given by

Understanding the XGBoost Algorithm

According to the forward distribution algorithm, we have

Understanding the XGBoost Algorithm

Substituting this into the previous loss function, letting a equal the first derivative and b equal the second derivative, the loss function can be expressed as

Understanding the XGBoost Algorithm

Where, a and b are

Understanding the XGBoost Algorithm

Thus, for all samples, the loss function is

Understanding the XGBoost Algorithm

Optimizing the above expression is equivalent to optimizing

Understanding the XGBoost Algorithm

The specific reason has been discussed when introducing the AdaBoost algorithm, so we will not elaborate here.

Regularization

To prevent overfitting, we often need to add a regularization term when optimizing the loss function.

Then, the loss function with the regularization term added is

Understanding the XGBoost Algorithm

In the previous article, we introduced

Understanding the XGBoost Algorithm

And the regularization term is constructed by the following formula

Understanding the XGBoost Algorithm

Where, M is the number of leaf nodes, and α and β are parameters of the regularization term used to control the complexity of the model.

Substituting (3) and (4) into (2), we get

Understanding the XGBoost Algorithm

To find the minimum of (5), we need to take the partial derivative of c[m], that is

Understanding the XGBoost Algorithm

Then set (6) to 0, we get

Understanding the XGBoost Algorithm

Next, we substitute (7) back into (5) to obtain the loss function L as

Understanding the XGBoost Algorithm

(8) is the loss function obtained in the k-th round.

Looking back at the entire process, we find that XGBoost is essentially about performing a second-order Taylor expansion of the loss function to obtain a solution, and then substituting it back into the loss function. In other words, it uses this solution to help construct the decision trees, thereby minimizing the residuals and achieving optimal model performance.

UA Academy is enrolling! Students who want to build foundational capabilities in the AI era, please pay attention!

Leave a Comment