Ultimate Comparison of Machine Learning Algorithms: Tree Models vs Neural Networks

Click the above “Beginner’s Visual Learning” to choose to add Star Mark or “Top”

Important content delivered first hand

Selected from towardsdatascience

Author: Andre Ye

Translated by: Machine Heart

Editor: Chen Ping

The tree model and neural networks are like two sides of a coin. In some cases, the performance of tree models even surpasses that of neural networks.

Due to the complexity of neural networks, they are often regarded as the “Holy Grail” for solving all machine learning problems. On the other hand, tree-based methods have not received equal attention, mainly because these algorithms appear simple. However, these two algorithms, although seemingly different, are like the front and back of a coin, both are important.

Tree Models vs Neural Networks

Tree-based methods generally outperform neural networks. Essentially, tree-based methods and neural network methods are categorized together because they both handle problems through incremental decomposition, rather than partitioning the entire dataset through complex boundaries like support vector machines or logistic regression.

It is clear that tree-based methods incrementally partition the feature space along different features to optimize information gain. Less obvious is that neural networks also handle tasks in a similar fashion. Each neuron monitors a specific part of the feature space (with various overlaps). When an input enters that space, certain neurons get activated.

Neural networks view this piece-by-piece model fitting from a probabilistic perspective, while tree-based methods adopt a deterministic perspective. Regardless, both performances depend on the depth of the model, as their components relate to different parts of the feature space.

Models with too many components (nodes for tree models and neurons for neural networks) will overfit, while models with too few components cannot provide meaningful predictions at all. (Both start by memorizing data points rather than learning to generalize.)

To gain a more intuitive understanding of how neural networks partition the feature space, you can read this article introducing the universal approximation theorem: https://medium.com/analytics-vidhya/you-dont-understand-neural-networks-until-you-understand-the-universal-approximation-theory-85b3e7677126.

Although decision trees have many powerful variants, such as random forests, gradient boosting, AdaBoost, and deep forests, generally speaking, tree-based methods are essentially simplified versions of neural networks.

Tree-based methods solve problems piece by piece using vertical and horizontal lines to minimize entropy (optimizers and losses). Neural networks solve problems piece by piece using activation functions.
Tree-based methods are deterministic, not probabilistic. This brings some nice simplifications, such as automatic feature selection.
Condition nodes activated in decision trees are similar to activated neurons in neural networks (information flow).
Neural networks transform inputs through fitted parameters, indirectly guiding the activation of subsequent neurons. Decision trees explicitly fit parameters to guide information flow. (This is a result corresponding to determinism versus probability.)

Information flows similarly in both models, but the flow in tree models is simpler.

1 and 0 Choices in Tree Models vs Probabilistic Choices in Neural Networks

Of course, this is an abstract conclusion and may even be controversial. Admittedly, there are many barriers to establishing this connection. Regardless, this is an important part of understanding when and why tree-based methods outperform neural networks.

For decision trees, handling tabular or structured data is very natural. Most people agree that using neural networks for regression and prediction of tabular data is overkill, so some simplifications are made here. Choosing 1 and 0 instead of probabilities is the main root of the differences between these two algorithms. Thus, tree-based methods can be successfully applied to situations that do not require probabilities, such as structured data.

For example, tree-based methods perform well on the MNIST dataset because each digit has several basic features. There is no need to calculate probabilities, and the problem is not very complex, which is why well-designed tree ensemble models can perform comparably to modern convolutional neural networks, or even better.

Generally, people tend to say that “tree-based methods just memorize rules,” which is true. Neural networks do the same, except they can memorize more complex, probabilistic rules. Neural networks do not explicitly give true/false predictions for conditions like x>3; rather, they amplify inputs to a very high value, resulting in a sigmoid value of 1 or generating continuous expressions.

On the other hand, due to their complexity, neural networks can do a lot of things. Convolutional layers and recurrent layers are outstanding variants of neural networks because the data they handle often requires the nuances of probabilistic calculations.

Few images can be modeled with 1 and 0. Decision trees cannot handle datasets with many intermediate values (like 0.5), which is why they perform well on the MNIST dataset, where pixel values are almost black or white, but other datasets’ pixel values are not (like ImageNet). Similarly, text has too much information and too many anomalies to be expressed in deterministic terms.

This is also why neural networks are primarily used in these fields, and why neural network research stagnated in the early 21st century (before 2000) when large amounts of image and text data were not accessible. Other common uses of neural networks are limited to large-scale predictions, such as YouTube’s video recommendation algorithms, which are very large and must utilize probabilities.

Any company’s data science team might use tree-based models instead of neural networks unless they are building a heavy application like blurring Zoom video backgrounds. But for everyday business classification tasks, tree-based methods make these tasks lightweight due to their deterministic nature, similar to neural networks.

In many practical cases, deterministic modeling is more natural than probabilistic modeling. For example, predicting whether a user will purchase a certain item from an e-commerce site is a good choice for tree models, as users naturally follow a rule-based decision process. A user’s decision process might look like this:

Have I had a pleasant shopping experience on this platform before? If so, continue.
Do I need this item now? (For example, should I buy sunglasses and swim trunks in winter?) If so, continue.
Based on my user statistics, is this a product I am interested in buying? If so, continue.
Is this item too expensive? If not, continue.
Do other customers’ reviews of this product give me enough confidence to purchase it? If so, continue.

In general, humans follow a rule-based and structured decision-making process. In these cases, probabilistic modeling is unnecessary.

Conclusion

It is best to view tree-based methods as a scaled-down version of neural networks, performing feature classification, optimization, information flow transmission, etc. in a simpler manner.
The main difference in usage between tree-based methods and neural network methods lies in the deterministic (0/1) versus probabilistic data structure. Using deterministic models can better model structured (tabular) data.
Do not underestimate the power of tree methods.

Reference link: https://towardsdatascience.com/when-and-why-tree-based-models-often-outperform-neural-networks-ceba9ecd0fd8

Good news!
Beginner's Visual Learning Knowledge Planet
Is now open to the public👇👇👇




Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial
Reply “Extension Module Chinese Tutorial” in the backend of the “Beginner's Visual Learning” public account to download the first OpenCV extension module tutorial in Chinese available online, covering over twenty chapters including extension module installation, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, etc.

Download 2: Python Visual Practical Projects 52 Lectures
Reply “Python Visual Practical Projects” in the backend of the “Beginner's Visual Learning” public account to download 31 visual practical projects, including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, facial recognition, etc., to facilitate quick learning of computer vision.

Download 3: OpenCV Practical Projects 20 Lectures
Reply “OpenCV Practical Projects 20 Lectures” in the backend of the “Beginner's Visual Learning” public account to download 20 practical projects based on OpenCV, achieving advanced learning of OpenCV.

Discussion Group

Welcome to join the public account reader group to exchange ideas with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat number below to join the group, and note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format for notes, otherwise, you will not be accepted. After successful addition, you will be invited to related WeChat groups based on research directions. Please do not send advertisements in the group, otherwise, you will be removed from the group. Thank you for your understanding~

Leave a Comment Cancel reply