Step-by-Step Guide to Understanding LSTM

Click on the above “Visual Learning for Beginners“, select to add to favorites or pin.

Important content delivered in real-time

1. What is LSTM

LSTM stands for Long Short-Term Memory, a type of recurrent neural network (RNN) that can handle sequential data and is widely used in fields such as natural language processing and speech recognition. Compared to the original RNN, LSTM solves the gradient vanishing problem and can handle long sequence data, making it the most popular RNN variant today.

2. Examples of LSTM Applications

Assuming our model inputs each word of a sentence sequentially, we need to classify the words. For example, consider two sentences: (1) “arrive Beijing on November 2nd,” where “Beijing” is the destination; (2) “leave Beijing on November 2nd,” where “Beijing” is the departure location. If we use a standard neural network, when the input is ‘Beijing’, the output will be fixed. However, we want ‘Beijing’ to be recognized as a destination when preceded by ‘arrive’ and as a departure location when preceded by ‘leave’. This is where LSTM comes in handy, as it can remember historical information. When it reads ‘Beijing’, LSTM knows whether it was preceded by ‘arrive’ or ‘leave’, allowing it to make different judgments based on historical context, leading to different outputs even with the same input.

3. Analysis of LSTM Structure

A standard neuron has one input and one output, as shown in the figure:

Step-by-Step Guide to Understanding LSTM

For the neuron h1, the input is x1, and the output is y1. LSTM replaces the standard neuron with an LSTM unit.

Step-by-Step Guide to Understanding LSTM

From the figure, we can see that LSTM has four inputs: input (model input), forget gate, input gate, and output gate. Therefore, compared to standard neural networks, LSTM has four times the number of parameters. These three gate signals are real numbers between 0 and 1, where 1 means fully open and 0 means closed. The forget gate determines whether the memory from the previous moment will be retained. When the forget gate is open, the previous memory is preserved; when it is closed, the previous memory is cleared. The input gate determines how much of the current input is retained, as not all input information at every moment is equally important. When the input is completely useless, the input gate is closed, meaning the current input information is discarded. The output gate determines how much of the current memory information will be output immediately; when the output gate is open, everything is output, and when it is closed, the current memory information is not output.

4. Derivation of LSTM Formulas

With the above knowledge, deriving the LSTM formulas is straightforward. In the figure, Step-by-Step Guide to Understanding LSTM represents the forget gate, represents the input gate, and represents the output gate. C is the memory cell that stores memory information. represents the memory information from the previous moment, represents the current memory information, and h is the output of the LSTM unit.

Step-by-Step Guide to Understanding LSTM

Forget gate calculation:

Step-by-Step Guide to Understanding LSTM

The Step-by-Step Guide to Understanding LSTM here means concatenating two vectors. The sigmoid function is mainly used to obtain a number between 0 and 1, which acts as the control signal for the forget gate.

Input gate calculation:

Step-by-Step Guide to Understanding LSTM

Current input:

Step-by-Step Guide to Understanding LSTM

Updating current memory information:

Step-by-Step Guide to Understanding LSTM

From this formula, we can see that the previous memory information Step-by-Step Guide to Understanding LSTM through the forget gate , and the current input through the input gate are combined to update the current memory information .

Input gate calculation:

Step-by-Step Guide to Understanding LSTM

The output of LSTM is determined by the output gate and the current memory information:

Step-by-Step Guide to Understanding LSTM

Thus, we understand the forward computation process of LSTM. With the LSTM forward propagation algorithm, deriving the backpropagation algorithm is easy. Using gradient descent to iteratively update all parameters, the key point is to calculate the partial derivatives of all parameters based on the loss function, which will not be detailed here.

Summary

Although LSTM has a complex structure, it is not difficult to master once the various parts and their relationships are clarified. In practical use, algorithm libraries such as Keras and PyTorch can be utilized, but understanding the LSTM model structure is still necessary.

References

https://zybuluo.com/hanbingtao/note/581764
http://www.cnblogs.com/pinard/p/6519110.html
http://blog.echen.me/2017/05/30/exploring-lstms/

Download 1: OpenCV-Contrib Extension Module Chinese Tutorial

Reply in the “Visual Learning for Beginners” public account background: Chinese tutorial for extension modules, to download the first Chinese version of the OpenCV extension module tutorial online, covering installation of extension modules, SFM algorithms, stereo vision, target tracking, biological vision, super-resolution processing and more than twenty chapters.

Download 2: 52 Lectures on Python Visual Practical Projects

Reply in the “Visual Learning for Beginners” public account background: Python Visual Practical Projects, to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, facial recognition, etc., to help quickly learn computer vision.

Download 3: 20 Lectures on OpenCV Practical Projects

Reply in the “Visual Learning for Beginners” public account background: 20 Lectures on OpenCV Practical Projects, to download 20 practical projects based on OpenCV for advancing OpenCV learning.

Group Chat

Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (to be gradually subdivided in the future), please scan the WeChat number below to join the group, with the note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please do not send advertisements in the group, or you will be removed, thank you for your understanding~

Leave a Comment Cancel reply