Abstract:
In humans, attention is a core attribute of all perceptual and cognitive operations. Given our limited capacity to process competitive sources of information, the attention mechanism selects, adjusts, and focuses on information most relevant to behavior.
For decades, the concept and function of attention have been studied across philosophy, psychology, neuroscience, and computer science. In the past six years, this feature has been extensively researched in deep neural networks. Currently, advancements in deep learning are primarily reflected in neural attention models across several application domains.
This paper provides a comprehensive overview and analysis of the development of neural attention models. We systematically review hundreds of architectures in this field, identifying and discussing those attention model architectures that exhibit significant impact. We have also established a set of automated methodological frameworks and made them publicly available to promote research in this area. By critically analyzing 650 pieces of literature, we describe the main uses of attention in convolutional, recurrent networks, and generative models, identifying common subgroups of usage and application.
Additionally, we describe the impact of attention in various application fields and its implications for the interpretability of neural networks. Finally, we outline potential trends and opportunities for further research, hoping that this review provides a concise overview of the major attention models in the field and guides researchers in developing future methods to drive further improvements.
To assess the widespread application of attention in deep neural networks, we conducted a systematic review of the field. In our review, we critically analyzed 650 papers. As a primary contribution of our work, we emphasize:
1. A replicable research methodology. In the appendix, we provide a detailed process for data collection, including the scripts used to collect papers and the charts we created;
2. An in-depth overview of the field. We critically analyzed 650 papers, extracting different metrics and using various visualization techniques to highlight overall trends in the field;
3. We describe the main attention mechanisms;
4. We propose the main neural structures that utilize attention mechanisms, describing their contributions to the field of neural networks;
5. We introduce how attention modules or interfaces are applied in classical DL architectures, expanding the family of neural networks;
Historically, research on computational attention systems began in the 1980s. It was not until mid-2014 that Neural Attention Networks (NANs) appeared in the field of Natural Language Processing (NLP), where attention provided significant advancements through scalable and direct networks, yielding promising results. Attention enables us to shift towards complex tasks such as dialogue machine understanding, sentiment analysis, machine translation, question answering, and transfer learning, all of which were previously very challenging. Subsequently, NANs emerged in other fields equally important to artificial intelligence, such as computer vision, reinforcement learning, and robotics. Currently, there are many attention architectures, but few have significantly higher impacts, as shown in Figure 2. In this figure, we depict the most relevant set of works organized by citation level and innovation, among which RNNSearch[44], Transformer[37], Memory Networks[38], “show, attend and tell”[45], and RAM[46] are the most important developments.
Currently, hybrid models leverage the major developments of attention in deep learning (Figure 6), attracting interest from the scientific community. Furthermore, hybrid models based on Transformer, GATs, and memory networks have emerged in multimodal learning and several other application areas. Hyperbolic Attention Networks (HAN)[122], Hyperbolic Graph Attention Networks (GHN)[123], Temporal Graph Networks (TGN)[124], and Memory-based Graph Networks (MGN)[87] are the most promising developments. Hyperbolic networks are a new type of architecture that combines the advantages of self-attention, memory, graphs, and hyperbolic geometry, activating neural networks for high-capacity reasoning rather than embeddings generated by deep neural networks. Since 2019, these networks have emerged as a new research branch, representing state-of-the-art generalization in neural machine translation, graph learning, and visual question answering tasks while keeping neural representations compact. Since 2019, GATs have also attracted much attention for their ability to learn complex relationships or interactions across a wide range of problems, from biology, particle physics, social networks to recommendation systems. To improve the representation of nodes and expand GATs’ capabilities to handle dynamic data (i.e., features or connections that change over time), architectures combining memory modules and temporal dimensions, such as MGNs and TGNs, have been proposed.
Convenient Viewing via Zhuanzhi
Convenient Download, please followZhuanzhi public account (click the blue above to follow)
Reply with “A66” to obtainthe latest “Comprehensive Overview of Attention Mechanisms, 66-page PDF with 569 references” download link

