Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

Introduction

I rarely write summary articles, but I feel it’s necessary to periodically summarize some interconnected knowledge points, so I’ve written this one. Since my content mainly focuses on time series and spatio-temporal prediction, I will primarily discuss RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM.

1. RNN

The most primitive recurrent neural network, essentially a fully connected network, is designed to consider past information, where the output depends not only on the current input but also on previous information, meaning the output is determined by previous information (the state) and the current input.

1.1 Structure Diagram

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM
Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

1.2 Formula

1.3 Advantages and Disadvantages

1.3.1 Advantages

① RNN is very suitable for processing sequential data because it considers previous information.

② It can be used together with CNN to achieve better task performance.

1.3.2 Disadvantages

① Gradient vanishing and gradient explosion.

② RNN uses more GPU memory compared to other CNNs and fully connected networks, making it harder to train.

③ If using tanh or relu as activation functions, it cannot handle very long sequences.

2. LSTM

To solve issues of gradient vanishing and explosion, and to better predict and classify sequential data, RNN gradually evolved into LSTM.

2.1 Structure Diagram

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

2.2 Formula

2.3 Extensions

In practical applications, single-layer LSTMs are generally not used; instead, multi-layer LSTMs are preferred, and bidirectional LSTMs perform quite well in many time series data.

2.3.1 Bidirectional LSTM

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

2.3.2 Deep Bidirectional LSTM

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

3. GRU

Due to the slow training speed of LSTM, GRU is a slightly modified version that can be much faster while maintaining similar accuracy, making GRU quite popular.

3.1 Structure Diagram

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

3.2 Formula

3.3 Structural Differences between LSTM and GRU

You can watch 【Deep Learning】 for a detailed explanation of the various formulas and differences between LSTM and GRU units.

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

4. ConvLSTM and ConvGRU

To build spatio-temporal sequence prediction models while capturing both temporal and spatial information, the fully connected weights in LSTM are replaced with convolutional ones.

4.1 ConvLSTM Structure Diagram

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

4.2 ConvLSTM Formula (from original paper)

4.3 ConvGRU (from original paper)

4.4 Discussing a Minor Issue

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTMDr. Shixingjian proposed that ConvLSTM, according to his description, should be where each weight is changed from a normal fully connected weight to a convolution. Therefore, it should transform from left to right; normally, there should be no i, f, and o gates on the right side, only determined by X and Ht-1, without C. Hence, the following structure doesn’t exist.

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTMHere, we revisit Dr. Shixingjian’s article immediately following the NIPS 2016 conference, which mentioned that ConvGRU also does not have C and can correspond one-to-one with the GRU formula.

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

I am unsure whether Dr. Shixingjian implemented it this way and achieved good results, or if the absence of C has little impact on the final experimental results, or if it was indeed a writing error; I can’t draw a conclusion here. What can be determined is that the structure in the trajectory GRU article has definitely transformed from GRU to ConvGRU without any issues. I have thus researched a few journal papers and top conference papers.

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM
Excerpt from ECCV 2018

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTMExcerpt from IEEE Trans

Afterwards, I investigated some articles and found it interesting that top conferences generally do not use the C form, while most journals do. I conducted another investigation; most implementations on GitHub directly transformed from LSTM to ConvLSTM, which means the absence of C affects the three gates, since when I reproduced it, I also first implemented LSTM and then modified it to ConvLSTM without paying much attention. This time, I conducted the experiment myself, and the structure without C can perform well in spatio-temporal predictions, while adding C leads to gradient issues. Therefore, everyone can have some self-understanding.

I personally recommend directly transforming from LSTM to ConvLSTM; I will gradually write articles explaining how to code this later.

5. ST-LSTM

This section mainly presents the ST-LSTM structure and formulas.

5.1 ST-LSTM Structure Diagram

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

5.2 ST-LSTM Formula

5.3 Stacking Structure

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

I will write a dedicated article soon on the reproduction and coding of this model, and due to this direct stacking structure, there will be some training tricks, such as Scheduled Sampling, etc.

Reference

  • https://towardsdatascience.com/understanding-rnn-and-lstm-f7cdf6dfc14e
  • http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
  • https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
  • https://medium.com/neuronio/an-introduction-to-convlstm-55c9025563a7
More Exciting Content (Click the Image to Read)
Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM
Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM
Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM
Public Account:AI Snail Car
Stay Humble, Stay Disciplined, Stay Progressive

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

Personal WeChat
Note:Nickname + School/Company + Direction
If there is no note, you won’t be added to the group!
Adding you to the AI Snail Car Exchange Group

Summary of RNN, LSTM, GRU, ConvLSTM, ConvGRU, and ST-LSTM

Click to see, muah muah!

Leave a Comment