Understanding Recurrent Neural Networks (RNNs)

↑↑↑ Follow “Star Mark” Datawhale

Daily Insights & Monthly Study Groups, Don’t Miss Out

Datawhale Insights

Focus: Neural Networks, Source: Artificial Intelligence and Algorithm Learning

Neural networks are the carriers of deep learning, and among neural network models, the most classic non-RNN model belongs here. Although it is not perfect, it possesses the ability to learn historical information. Whether in the encode-decode framework, attention models, self-attention models, or the more powerful Bert model family, they all stand on the shoulders of RNNs, continuously evolving and becoming stronger.

This article elaborates on all aspects of RNNs, including model structure, advantages and disadvantages, various applications of RNN models, commonly used activation functions, the shortcomings of RNNs, and how GRU and LSTM attempt to address these issues, as well as RNN variants.

The main feature of this article is its illustrated version, followed by concise language and comprehensive summaries.

Overview

The architecture of traditional RNNs. Recurrent Neural Networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs, while having hidden states. They are typically represented as follows:

Understanding Recurrent Neural Networks (RNNs)

For each time step, the activation function, output is expressed as:

Here, is the shared weight coefficient of the time dimension network.

is the activation function.

Understanding Recurrent Neural Networks (RNNs)

The table below summarizes the advantages and disadvantages of typical RNN architectures:

Advantages	Disadvantages
Handles arbitrary length inputs	Slow computation speed
Model shape does not change with input length	Difficult to capture information from long ago
Computation considers historical information	Cannot consider any future inputs for the current state
Weights shared over time

Applications of RNNs

RNN models are primarily applied in the fields of natural language processing and speech recognition. The table below summarizes different applications:

RNN Type	Illustration	Examples
1-to-1		Traditional Neural Networks
1-to-Many		Music Generation
Many-to-1		Sentiment Classification
Many-to-Many		Named Entity Recognition
Many-to-Many		Machine Translation

Loss Function

For RNN networks, the loss function for all time steps is defined based on the loss at each time step, as follows:

Backpropagation Through Time

Backpropagation occurs at each time point. At time step, the partial derivative of the loss with respect to the weight matrix is represented as follows:

Handling Long-Short Dependencies

Commonly Used Activation Functions

The most commonly used activation functions in RNN modules are described as follows:

Sigmoid	Tanh	RELU

Gradient Vanishing/Explosion

In RNNs, the gradient vanishing and explosion phenomena are often encountered. This occurs because it is difficult to capture long-term dependencies, as the multiplicative gradients can decrease/increase exponentially with the number of layers.

Gradient Clipping

Gradient clipping is a technique used to address the gradient explosion problem encountered during backpropagation. By limiting the maximum value of the gradient, this phenomenon is controlled in practice.

Understanding Recurrent Neural Networks (RNNs)

Types of Gates

To solve the vanishing gradient problem, specific gates are used in certain types of RNNs, and they usually have clear purposes. They are typically labeled as:

Where, is the gate-specific coefficient, which is a sigmoid function. The main content is summarized in the table below:

Type of Gate	Function	Applications
Update Gate	How important is the past to the present?	GRU, LSTM
Reset Gate	Should past information be discarded?	GRU, LSTM
Forget Gate	Is it erasing a unit?	LSTM
Output Gate	How much of a gate is exposed?	LSTM

GRU/LSTM

Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) address the vanishing gradient problem encountered in traditional RNNs, with LSTM being a generalization of GRU. The table below summarizes the feature equations of each structure:

Understanding Recurrent Neural Networks (RNNs)

Note: The symbol represents element-wise multiplication between two vectors.

Variants of RNNs

The table below summarizes other commonly used RNN models:

Bidirectional (BRNN)	Deep (DRNN)

References:

https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks

It’s not easy to organize, pleaselikeand share↓

Overview

Applications of RNNs

Handling Long-Short Dependencies

Leave a Comment Cancel reply