The attention mechanism is mentioned in both of the following articles:
How to make chatbot conversations more informative and how to automatically generate text summaries.
Today, let’s take a look at what attention is.
This paper is considered the first work using the attention mechanism in NLP. They applied the attention mechanism to Neural Machine Translation (NMT), which is actually a typical sequence to sequence model, or an encoder to decoder model here.
The encoder uses a Bi-directional RNN, so the representation of each word can contain information from both the previous and the next word; the forward RNN generates hidden states in the same order as the input sequence, while the backward RNN generates hidden states in reverse order. We then combine the two states at each time step into a single state, which contains information from both the previous and the next words. This state will then be used in the decoder part.
The conditional probability here is as follows:
The difference between this and a typical encoder-decoder model is that this conditional probability considers the context vector c of each word.
c is calculated from the h obtained earlier.
The weight alpha is calculated from e, where alphaij represents the probability that yi is generated from xj, reflecting the importance of hj.
This is where the attention mechanism is applied, allowing the decoder to decide which parts of the input sentence to focus on.
With the attention mechanism, it is no longer necessary to transform all input information into a fixed-length vector.
e is a score used to evaluate the matching degree between the input at time j and the output at time i.
a is an alignment model, which is a feedforward neural network.
This article includes an implementation of seq2seq + attention:
[keras implementation of seq2seq](http://www.jianshu.com/p/c294e4cb4070)
Recommended read the original article to perhaps find what you are looking for: [Introductory Questions][TensorFlow][Deep Learning][Reinforcement Learning][Neural Networks][Machine Learning][Natural Language Processing][Chatbots]