Deep Learning vs Machine Learning Models: Performance Comparison of GBRT and DNN in Time Series Forecasting

In recent years, classical parametric (autoregressive) methods in the field of time series forecasting have largely been replaced by complex deep learning frameworks, such as DeepGlo or LSTNet. These novel deep learning-based methods are not only considered superior to traditional methods like ARIMA and simple machine learning models like GBRT, but they have also raised expectations in the field of machine learning time series forecasting models, with the belief that deep learning is necessary to provide state-of-the-art forecasting results.

However, we need to regularly compare the achievements of deep learning methods with simple yet effective models across various research areas in machine learning to maintain the authenticity of progress in their respective fields. In addition to the increasing complexity of time series forecasting models, another motivation is that current research is overly biased towards deep learning-based methods, limiting the diversity of solutions for highly diverse problems in practical applications.

https://arxiv.org/pdf/2101.02118

This paper revolves around the following two research questions:

What effect does carefully configuring the input and output structure of the GBRT model (a decision tree regression model) have in a window-based learning framework for time series forecasting?
How does a simple yet well-configured GBRT model perform compared to state-of-the-art deep learning time series forecasting frameworks?

Evaluation Datasets

Dataset	Number of Time Series (n)	Time Series Length (T)	Sampling Rate (s)	Number of Target Channels (L)	Number of Auxiliary Channels (M)	Forecast Window Size (h)	Number of Training Time Points (t0)	Number of Testing Time Points (τ)
Electricity	70	26,136	Hourly	1	0	24	25,968	168
Traffic	90	10,560	Hourly	1	0	24	10,392	168
ElectricityV2	370	6,000	Hourly	1	0	24	5,832	168
TrafficV2	963	4,151	Hourly	1	0	24	3,983	168
PeMSD7(M)	228	12,672	Every 5 minutes	1	0	9	11,232	1,440
Exchange-Rate	8	7,536	Daily	1	0	24	6,048	1,488
Solar-Energy	137	52,600	Every 10 minutes	1	0	24	42,048	10,512
Beijing PM2.5	1	43,824	Hourly	1	16	1, 3, 6	35,064	8,760
Urban Air Quality	1	2,891,387	Hourly	1	16	6	1,816,285	1,075,102
SML 2010	1	4,137	Every minute	1	26	1	3,600	537
NASDAQ 100	1	40,560	Every minute	1	81	1	37,830	2,730

Benchmark Model GBRT

Transforming Gradient Boosting Regression Trees (GBRT) into a window-based regression framework, followed by feature engineering on the model’s input and output structure to maximize benefits from additional contextual information, elevates this simple machine learning method to the standards of competitive DNN time series forecasting models.

Window-Based Input Setup

To enhance the predictive performance of the GBRT model, we adopted a window-based input setup. This method better utilizes the characteristics of time series data.

Deep Learning vs Machine Learning Models: Performance Comparison of GBRT and DNN in Time Series Forecasting

Data Transformation Steps

Create Windows:

Segment the original time series data into multiple small time windows. Each window contains data from consecutive time points.
For instance, if we have daily temperature data for a year, we can split it into windows, each containing 30 days.

Flatten Windows:

Each window contains data from multiple time points, and we need to convert this multi-dimensional data into a one-dimensional vector.
Specifically, we concatenate the target values (e.g., temperature) and covariates (e.g., date, humidity, etc.) from each time point into a long vector.
For example, a window containing data for 5 days, with temperature and humidity values for each day, can be transformed into a one-dimensional vector containing 10 numbers.

Handle Covariates:

In each window, we select the covariate (e.g., humidity) from the last day as the representative for the entire window.
This reduces the complexity of the input data while retaining contextual information for the current time point.

Multi-Output Prediction:

For future predictions, we aim to predict multiple future time points rather than predicting each time point individually.
To achieve this, we use a multi-output GBRT model, allowing the model to predict values for multiple time points simultaneously.
For example, if we want to predict the temperature for the next 5 days, the model will output predictions for all 5 days at once.

Deep Time Series Forecasting Methods

Temporal Regularized Matrix Factorization (TRMF)

This model is based on matrix factorization and has high scalability because it can model the global structure in the data. Although it is one of the earlier models in this study, it can only capture linear dependencies in time series data; nevertheless, it still shows competitive results.

Long- and Short-term Time-series Network (LSTNet)

This model emphasizes local multivariate patterns (modeled through convolutional layers) and long-term dependencies (captured through recurrent network structures). LSTNet originally had two versions: “LSTNet-Skip” and “LSTNet-Attn”. Due to the irreproducibility of “LSTNet-Attn”, we evaluated “LSTNet-Skip” in subsequent experiments.

Dual-Stage Attention-Based RNN (DARNN)

It first passes model inputs through an input attention mechanism, then employs an encoder-decoder model with an additional temporal attention mechanism. This model was initially evaluated on two multivariate datasets, but due to its direct applicability to univariate datasets, it has seen some subsequent applications.

Deep Global Local Forecaster (DeepGlo)

This model is based on a global matrix factorization structure, regularized by a temporal convolutional network. The model includes additional channels derived from date and timestamps, initially evaluated on univariate datasets.

Temporal Fusion Transformer (TFT)

This model combines recurrent layers for local processing with self-attention layers to capture long-term dependencies in the data. The model not only dynamically focuses on relevant features during the learning process but also suppresses features deemed irrelevant through gating mechanisms.

DeepAR

A self-regressive probabilistic RNN model that estimates the parameter distribution of time series through temporal and categorical covariates. Due to the open-source implementation of DeepAR (GluonTS).

Deep State Space Model (DeepState)

A probabilistic generative model that learns to parameterize linear state space models using RNNs. Similar to DeepAR, the open-source implementation can be technically accessed via GluonTS.

Deep Air Quality Forecasting Framework (DAQFF)

This framework consists of two stages of feature representation: data is first processed through three 1D convolutional layers, followed by two bidirectional LSTM layers, and finally predictions are made through a linear layer. As the name suggests, this framework is specifically built for predicting air quality, and thus has been evaluated on the corresponding multivariate datasets listed in Table 1.

Model Performance Comparison

Univariate Time Series Forecasting Experimental Results

The goal of univariate time series forecasting is to predict a single target variable based on historical data of that target variable.

Overall results show that window-based GBRT demonstrates strong competitiveness among all models.

Multivariate Time Series Forecasting Experimental Results

In the multivariate time series forecasting setup, the dataset inherently provides multiple features, but only one single target variable needs to be predicted.

Results indicate that even deep learning frameworks designed specifically for multivariate prediction, such as DARNN, can be surpassed by a well-configured simple GBRT baseline model.

Summary of Experimental Results

Effectiveness of Simplified Models: Despite the conceptual simplicity of the GBRT model, through appropriate feature engineering and windowed input settings, it can achieve performance comparable to complex DNN models across multiple time series forecasting tasks, and even outperform them in certain cases.
Importance of Machine Learning Baselines: Simplified machine learning baselines should not be underestimated. Instead, these models can provide competitive predictive performance when carefully configured. Therefore, ensuring the authenticity of progress in the field of time series forecasting requires more detailed configuration and optimization of these foundational models.
Future Work Directions: Future work can further explore applying this window-based input setup to other simple machine learning models, such as Multi-Layer Perceptrons (MLP) and Support Vector Machines (SVM). The potential of these models in time series forecasting has yet to be fully explored.

Editor /Fan Ruiqiang

Reviewer / Fan Ruiqiang

Verification / Fan Ruiqiang

Click below