Selected from Machine Learning Mastery
Author: Jason Brownlee
Translated by Machine Heart
Contributed by: Li Zenan
How should we cope when LSTM recurrent neural networks face long sequence inputs? Jason Brownlee provides us with 6 solutions.
Long Short-Term Memory (LSTM) recurrent neural networks can learn and remember long sequences of input. If your problem has one output for each input (like time series forecasting and text translation tasks), then LSTMs can perform quite well. However, LSTMs struggle when facing extremely long input sequences—where there is a single or few outputs. This problem is usually referred to as sequence labeling or sequence classification.
Some examples include:
-
Sentiment classification of text content containing thousands of words (Natural Language Processing).
-
Classifying EEG data with thousands of time steps (Medical field).
-
Classifying coding/non-coding gene sequences of thousands of DNA base pairs (Genomics).
When using recurrent neural networks (like LSTMs), these so-called sequence classification tasks require special handling. In this article, you will discover 6 methods for dealing with long sequences.
1. Keep It As Is
Training/input as is may lead to significantly increased training times. Additionally, attempting backpropagation through very long sequences can result in gradient vanishing, which in turn weakens the model’s reliability. In large LSTM models, the step size is often limited to between 250-500.
2. Truncate Sequences
The most intuitive way to handle very long sequences is to truncate them. This can be done by selectively deleting some time steps at the beginning or end of the input sequence. This approach shortens the sequence to a manageable length at the cost of losing some data, and the risk is evident: some data that could aid in accurate predictions may be lost in the process.
3. Summarize Sequences
In certain domains, we can try to summarize the content of the input sequence. For example, when the input sequence is text, we can remove all words below a specified frequency. We can also retain only words that exceed a certain specified value across the entire training dataset. Summarizing can help the system focus on the most relevant issues while shortening the length of the input sequence.
4. Random Sampling
A relatively less systematic way of summarizing sequences is random sampling. We can randomly select time steps from the sequence and delete them, thus shortening the sequence to the specified length. We can also specify to choose random contiguous subsequences of a total length, balancing overlapping or non-overlapping content.
In the absence of systematic ways to shorten sequence length, this method can be effective. This method can also be used for data augmentation, creating many potentially different input sequences. When available data is limited, this method can enhance the robustness of the model.
5. Time-Blocked Backpropagation
In addition to updating the model based on the entire sequence, we can estimate gradients in the last few time steps. This method is known as