Latest Review Paper on Attention Mechanisms and Related Code

[Introduction]The Attention mechanism originates from mimicking human thinking patterns and has been widely applied in machine translation, sentiment classification, automatic summarization, automatic question answering, dependency analysis, and other machine learning applications. The editor has compiled a review on the application of Attention mechanisms in NLP titled An Introductory Survey on Attention Mechanisms in NLP Problems, and provided some related code links.

Overview

In the figure below, the left side shows the traditional Seq2Seq model (encoding a sequence and then decoding it), which is based on the traditional LSTM model. In the decoder, the hidden state at a certain timestamp only depends on the current hidden state and the output from the previous timestamp. The right side shows the Attention-based Seq2Seq model, where the output of the decoder also depends on a context feature (c), which is obtained through a weighted average of the hidden states from all timestamps in the encoder, with the weights being the Attention Scores (a) between the current timestamp and each timestamp in the encoder.

Latest Review Paper on Attention Mechanisms and Related Code

Latest Review Paper on Attention Mechanisms and Related Code

Latest Review Paper on Attention Mechanisms and Related Code

General Form of Attention

The following formula represents the basic form of Attention (Basic Attention), where u is the matching feature vector based on the current task, used to interact with the context. vi is the feature vector at a certain timestamp in the sequence, ei is the unnormalized Attention Score, ai is the normalized Attention Score, and c is the context feature for the current timestamp calculated based on the Attention Scores and the feature sequence v.

Latest Review Paper on Attention Mechanisms and Related Code

In most cases, ei can be calculated using several methods as shown below:

Latest Review Paper on Attention Mechanisms and Related Code

In practical applications, in addition to the basic Attention, there are various variants of Attention. Below we introduce some common variants:

Variant – Multi-dimensional Attention

For each u, Basic Attention generates an Attention Score ai for each vi, meaning each u corresponds to a 1-D Attention Score vector. Multi-dimensional Attention produces a higher-dimensional Attention matrix aimed at capturing Attention features in different feature spaces, such as some forms of 2D Attention shown below:

Latest Review Paper on Attention Mechanisms and Related Code

Variant – Hierarchical Attention

Some Attention algorithms consider Attention between different semantic levels. For example, the model below sequentially uses word-level and sentence-level Attention to obtain better features:

Latest Review Paper on Attention Mechanisms and Related Code

Variant – Self Attention

Replacing u in the above formula with vi from the context sequence gives us Self Attention. In NLP, Self Attention can capture some dependency relationships between words in a sentence. Additionally, in some tasks, the semantics of a word is closely related to the context. For example, in the following two sentences, the word “bank” refers to both a financial institution and the edge of a river. To accurately determine the current meaning of “bank,” we can rely on the context of the sentence.

I arrived at the bank after crossing the street.

I arrived at the bank after crossing the river.

Variant – Memory-based Attention

The form of Memory-based Attention is as follows, where {(ki, vi)} is called Memory, which is actually the synonyms of the input. Particularly, when ki and vi are equal, Memory-based Attention is the same as Basic Attention.

Latest Review Paper on Attention Mechanisms and Related Code

Latest Review Paper on Attention Mechanisms and Related Code

For example, in QA questions, Memory-based Attention can iteratively update Memory to shift attention to the location of the answer.

Latest Review Paper on Attention Mechanisms and Related Code

Evaluation of Attention

Quantitative evaluation of Attention can be done through intrinsic and extrinsic methods. Intrinsic evaluation is based on labeled Alignment data, thus requiring a large amount of manual labeling. The extrinsic method is simpler, directly comparing the model’s performance on specific tasks. However, the issue with extrinsic evaluation is that it is difficult to determine whether performance improvements are due to the Attention mechanism.

Quantitative evaluation is typically achieved through visualized heatmaps:

Latest Review Paper on Attention Mechanisms and Related Code

Related Attention Code

References:

Please follow the Zhuanzhi Official Account (scan the QR code below, or click on the blue Zhuanzhi above)

  • Reply “ISAM” in the background to obtain the latest PDF download link~

-END-

Zhuan · Zhi

Complete Knowledge Materials on 26 Topics in Artificial Intelligence and join the Zhuanzhi AI service group: Welcome to scan WeChat to join the Zhuanzhi AI Knowledge Star Group, and obtain professional knowledge tutorial videos and consult with experts!

Latest Review Paper on Attention Mechanisms and Related Code

Please log in to www.zhuanzhi.ai or click to read the original text, register and log in to Zhuanzhi to obtain more AI knowledge materials!

Latest Review Paper on Attention Mechanisms and Related Code

Please add the Zhuanzhi Assistant WeChat (scan the QR code below to add), join the Zhuanzhi theme group (please note the theme type: AI, NLP, CV, KG, etc.) for communication~

Latest Review Paper on Attention Mechanisms and Related Code
AI Project Technical & Business Cooperation: [email protected], or scan the above QR code to contact!

Please follow the Zhuanzhi official account to obtain professional knowledge in artificial intelligence!

Click “Read the original text” to use Zhuanzhi

Leave a Comment