XGBoost Split Point Algorithm Explained

Introduction

The previous article introduced the algorithm principles of XGBoost and introduced the scoring function (objective function) that measures the quality of tree structures. The best split point is selected based on the scoring function before and after the feature split points, but a detailed introduction to the node splitting algorithm was not provided. This article provides a detailed introduction to the split point algorithm of XGBoost, referencing Dr. Tianqi Chen’s paper “XGBoost: A Scalable Tree Boosting System”. Reply with XGBoost to get the paper download link.

Table of Contents

  1. Parallel Principle

  2. Greedy Algorithm for Split Point

  3. Quantile Algorithm for Split Point

  4. Weighted Quantile Algorithm for Split Point

  5. Splitting Algorithm for Sparse Data

  6. Conclusion

1. Parallel Principle

XGBoost generates CART trees serially, but it can process features in parallel. The parallel principle of XGBoost is reflected in the selection of the optimal split point. Suppose there are M features in the sample data; the best split point algorithm in the process of constructing a CART tree is as follows:

XGBoost Split Point Algorithm Explained

1. The red box indicates sorting the training data according to the size of each feature and saving it as a block structure, with the number of blocks equal to the number of features.

2. The green bar indicates selecting the best feature split point for each block structure, where the node splitting criterion is the degree of decrease of the objective function, which can be referenced in the previous text.

3. The black box indicates comparing the gains of the objective function decrease for the best feature split points of each block structure and selecting the best split point.

2. Greedy Algorithm for Split Point

The splitting point algorithm for each block structure follows the same idea; therefore, I will focus on the splitting point algorithm for a specific block structure.

XGBoost quantile algorithm: Sort the sample data based on features, then split the features from small to large, compare the objective function’s value after each split, and select the node with the maximum decrease as the optimal split point for that feature. Finally, compare the objective function decrease values of the optimal split points from different block structures and select the feature with the maximum decrease as the optimal split point.

The flowchart is as follows:

XGBoost Split Point Algorithm Explained

[Example] The table below shows the feature values of a certain column in the samples.

XGBoost Split Point Algorithm Explained

Re-sort the samples based on feature size:

XGBoost Split Point Algorithm Explained

Greedy algorithm split nodes:

XGBoost Split Point Algorithm Explained

The red arrows indicate each split node, selecting the point where the objective function decreases the most as the split node.

3. Quantile Algorithm for Split Point

If the feature is a continuous value, the computation using the above greedy algorithm is extremely large. When the sample size is sufficiently large, use feature quantiles to split features. The flowchart is as follows:

XGBoost Split Point Algorithm Explained

[Example] The table below shows the feature values of a certain column in the samples, using the third quantile as the split node.

XGBoost Split Point Algorithm Explained

Sort the samples based on feature size:

XGBoost Split Point Algorithm Explained

Use the feature’s third quantile as the split node:

XGBoost Split Point Algorithm Explained

The red arrows indicate each split node, selecting the point where the objective function decreases the most as the split node.

4. Weighted Quantile Algorithm for Split Point

The previous section assumed equal sample weights. Using sample quantiles to evenly distribute the loss function introduces bias. This section uses sample weights to evenly distribute the loss function.

The loss function is as follows:

XGBoost Split Point Algorithm Explained

Transforming it gives:

XGBoost Split Point Algorithm Explained

The xi loss function can be viewed as the mean squared error with a label of −gi/hi, multiplied by the weight of size hi. In other words, the contribution weight of xi to the loss is hi, and the quantile of the sample weight equals the evenly distributed error.

The previous section assumed equal sample weights, using the quantile of feature values as the split point. This section assumes sample weights as hi, and the steps for constructing the evenly distributed sample weight are:

(1) Sort the samples based on feature size.

(2) Define the sorting function:

XGBoost Split Point Algorithm Explained

Where x and z represent features.

(3) Set the quantile of the sorting function as the split point.

[Example] See the figure below:

XGBoost Split Point Algorithm Explained

The relationship between features and corresponding sorting function values is shown in the table below:

XGBoost Split Point Algorithm Explained

The red arrow indicates using the third quantile as the split point.

Finally, select the optimal split point.

5. Splitting Algorithm for Sparse Data

Sparse data is very common in practical projects, mainly caused by: 1. Missing data; 2. Statistical zeros; 3. One-hot encoded feature representations.

The splitting point algorithm for sparse data:

XGBoost Split Point Algorithm Explained

When feature values are missing, samples containing missing feature values are mapped to the default direction branch, and then the optimal split point for that node is calculated. The optimal split point corresponds to the default split direction when feature values are missing.

6. Conclusion

This article is the author’s understanding of Dr. Tianqi Chen’s paper “XGBoost: A Scalable Tree Boosting System”. If there are different opinions, feel free to communicate.

References:

Chen Tianqi “XGBoost: A Scalable Tree Boosting System”

Recommended Reading Articles

Summary of XGBoost Algorithm Principles

XGBoost Split Point Algorithm Explained

-END-

XGBoost Split Point Algorithm Explained

Leave a Comment