Deep Learning vs Machine Learning Models: Performance Comparison of GBRT and DNN in Time Series Forecasting

In recent years, classical parametric (autoregressive) methods in the field of time series forecasting have largely been replaced by complex deep learning frameworks, such as DeepGlo or LSTNet. These novel deep learning-based methods are not only considered superior to traditional methods like ARIMA and simple machine learning models like GBRT, but they have also raised expectations in the field of machine learning time series forecasting models, with the belief that deep learning is necessary to provide state-of-the-art forecasting results.

However, we need to regularly compare the achievements of deep learning methods with simple yet effective models across various research areas in machine learning to maintain the authenticity of progress in their respective fields. In addition to the increasing complexity of time series forecasting models, another motivation is that current research is overly biased towards deep learning-based methods, limiting the diversity of solutions for highly diverse problems in practical applications.

https://arxiv.org/pdf/2101.02118

This paper revolves around the following two research questions:

  1. What effect does carefully configuring the input and output structure of the GBRT model (a decision tree regression model) have in a window-based learning framework for time series forecasting?
  2. How does a simple yet well-configured GBRT model perform compared to state-of-the-art deep learning time series forecasting frameworks?

Evaluation Datasets

Dataset Number of Time Series (n) Time Series Length (T) Sampling Rate (s) Number of Target Channels (L) Number of Auxiliary Channels (M) Forecast Window Size (h) Number of Training Time Points (t0) Number of Testing Time Points (τ)
Electricity 70 26,136 Hourly 1 0 24 25,968 168
Traffic 90 10,560 Hourly 1 0 24 10,392 168
ElectricityV2 370 6,000 Hourly 1 0 24 5,832 168
TrafficV2 963 4,151 Hourly 1 0 24 3,983 168
PeMSD7(M) 228 12,672 Every 5 minutes 1 0 9 11,232 1,440
Exchange-Rate 8 7,536 Daily 1 0 24 6,048 1,488
Solar-Energy 137 52,600 Every 10 minutes 1 0 24 42,048 10,512
Beijing PM2.5 1 43,824 Hourly 1 16 1, 3, 6 35,064 8,760
Urban Air Quality 1 2,891,387 Hourly 1 16 6 1,816,285 1,075,102
SML 2010 1 4,137 Every minute 1 26 1 3,600 537
NASDAQ 100 1 40,560 Every minute 1 81 1 37,830 2,730

Benchmark Model GBRT

Transforming Gradient Boosting Regression Trees (GBRT) into a window-based regression framework, followed by feature engineering on the model’s input and output structure to maximize benefits from additional contextual information, elevates this simple machine learning method to the standards of competitive DNN time series forecasting models.

Window-Based Input Setup

To enhance the predictive performance of the GBRT model, we adopted a window-based input setup. This method better utilizes the characteristics of time series data.

Deep Learning vs Machine Learning Models: Performance Comparison of GBRT and DNN in Time Series Forecasting

Data Transformation Steps

  1. Create Windows:
  • Segment the original time series data into multiple small time windows. Each window contains data from consecutive time points.
  • For instance, if we have daily temperature data for a year, we can split it into windows, each containing 30 days.
  • Flatten Windows:
    • Each window contains data from multiple time points, and we need to convert this multi-dimensional data into a one-dimensional vector.
    • Specifically, we concatenate the target values (e.g., temperature) and covariates (e.g., date, humidity, etc.) from each time point into a long vector.
    • For example, a window containing data for 5 days, with temperature and humidity values for each day, can be transformed into a one-dimensional vector containing 10 numbers.
  • Handle Covariates:
    • In each window, we select the covariate (e.g., humidity) from the last day as the representative for the entire window.
    • This reduces the complexity of the input data while retaining contextual information for the current time point.
  • Multi-Output Prediction:
    • For future predictions, we aim to predict multiple future time points rather than predicting each time point individually.
    • To achieve this, we use a multi-output GBRT model, allowing the model to predict values for multiple time points simultaneously.
    • For example, if we want to predict the temperature for the next 5 days, the model will output predictions for all 5 days at once.

    Deep Time Series Forecasting Methods

    *Temporal Regularized Matrix Factorization (TRMF)*

    This model is based on matrix factorization and has high scalability because it can model the global structure in the data. Although it is one of the earlier models in this study, it can only capture linear dependencies in time series data; nevertheless, it still shows competitive results.

    *Long- and Short-term Time-series Network (LSTNet)*

    This model emphasizes local multivariate patterns (modeled through convolutional layers) and long-term dependencies (captured through recurrent network structures). LSTNet originally had two versions: “LSTNet-Skip” and “LSTNet-Attn”. Due to the irreproducibility of “LSTNet-Attn”, we evaluated “LSTNet-Skip” in subsequent experiments.

    Dual-Stage Attention-Based RNN (DARNN)

    It first passes model inputs through an input attention mechanism, then employs an encoder-decoder model with an additional temporal attention mechanism. This model was initially evaluated on two multivariate datasets, but due to its direct applicability to univariate datasets, it has seen some subsequent applications.

    *Deep Global Local Forecaster (DeepGlo)*

    This model is based on a global matrix factorization structure, regularized by a temporal convolutional network. The model includes additional channels derived from date and timestamps, initially evaluated on univariate datasets.

    Temporal Fusion Transformer (TFT)

    This model combines recurrent layers for local processing with self-attention layers to capture long-term dependencies in the data. The model not only dynamically focuses on relevant features during the learning process but also suppresses features deemed irrelevant through gating mechanisms.

    *DeepAR*

    A self-regressive probabilistic RNN model that estimates the parameter distribution of time series through temporal and categorical covariates. Due to the open-source implementation of DeepAR (GluonTS).

    *Deep State Space Model (DeepState)*

    A probabilistic generative model that learns to parameterize linear state space models using RNNs. Similar to DeepAR, the open-source implementation can be technically accessed via GluonTS.

    Deep Air Quality Forecasting Framework (DAQFF)

    This framework consists of two stages of feature representation: data is first processed through three 1D convolutional layers, followed by two bidirectional LSTM layers, and finally predictions are made through a linear layer. As the name suggests, this framework is specifically built for predicting air quality, and thus has been evaluated on the corresponding multivariate datasets listed in Table 1.

    Model Performance Comparison

    Univariate Time Series Forecasting Experimental Results

    The goal of univariate time series forecasting is to predict a single target variable based on historical data of that target variable.

    Deep Learning vs Machine Learning Models: Performance Comparison of GBRT and DNN in Time Series Forecasting
    Deep Learning vs Machine Learning Models: Performance Comparison of GBRT and DNN in Time Series Forecasting

    Overall results show that window-based GBRT demonstrates strong competitiveness among all models.

    Multivariate Time Series Forecasting Experimental Results

    In the multivariate time series forecasting setup, the dataset inherently provides multiple features, but only one single target variable needs to be predicted.

    Deep Learning vs Machine Learning Models: Performance Comparison of GBRT and DNN in Time Series Forecasting

    Results indicate that even deep learning frameworks designed specifically for multivariate prediction, such as DARNN, can be surpassed by a well-configured simple GBRT baseline model.

    Summary of Experimental Results

    1. Effectiveness of Simplified Models: Despite the conceptual simplicity of the GBRT model, through appropriate feature engineering and windowed input settings, it can achieve performance comparable to complex DNN models across multiple time series forecasting tasks, and even outperform them in certain cases.
    2. Importance of Machine Learning Baselines: Simplified machine learning baselines should not be underestimated. Instead, these models can provide competitive predictive performance when carefully configured. Therefore, ensuring the authenticity of progress in the field of time series forecasting requires more detailed configuration and optimization of these foundational models.
    3. Future Work Directions: Future work can further explore applying this window-based input setup to other simple machine learning models, such as Multi-Layer Perceptrons (MLP) and Support Vector Machines (SVM). The potential of these models in time series forecasting has yet to be fully explored.

    Editor /Fan Ruiqiang

    Reviewer / Fan Ruiqiang

    Verification / Fan Ruiqiang

    Click below

    Follow us

    Leave a Comment