A Detailed Guide to Self-Attention Mechanism

A Detailed Guide to Self-Attention Mechanism

Author: Greatness Comes from Perseverance @ Zhihu (Authorized) Source: https://zhuanlan.zhihu.com/p/410776234 Self-Attention is the core idea of Transformer. Recently, I re-read the paper and gained some new insights. Thus, I wrote this article to share my thoughts with readers. When I first encountered Self-Attention, the most confusing part for me was the three matrices Q, K, … Read more

Understanding BERT: The Essence, Principles, and Applications of BERT

Understanding BERT: The Essence, Principles, and Applications of BERT

This article will coverthe essence of BERT, the principles of BERT, and the applications of BERTBidirectional Encoder Representations from Transformers | BERT. Google BERT 1. the essence of BERT BERT Architecture: A pre-trained language model based on a multi-layer Transformer encoder that captures the bidirectional context of text through Tokenization, various Embeddings, and task-specific output … Read more

Understanding Attention Mechanism in Neural Networks

This article will coverthe essence of Attention,the principle of Attention and its applications to help you understand the Attention mechanism. 1.The Essence of Attention The core logic: from focusing on everything to focusing on key points The Attention mechanism can grasp the key points when processing long texts, without losing important information. The Attention mechanism … Read more

Understanding Self-Attention and Multi-Head Attention in Neural Networks

With the rapid popularity of the Transformer model, Self-Attention and Multi-Head Attention have become core components in the field of Natural Language Processing (NLP).This article will analyze these two attention mechanisms from three aspects: brief introduction, workflow, and comparison. 1. Brief Introduction Self-Attention: Allows each element in the input sequence to focus on and weight … Read more

The Evolution of Large Models: From Transformer to DeepSeek-R1

📖 Reading Time: 19 minutes 🕙 Release Date: February 14, 2025 ❝ Recent Hot Articles: The Most Comprehensive Mathematical Principles of Neural Networks (Code and Formulas) Intuitive Explanation Welcome to follow the Zhihu and WeChat public account columns LLM Architecture Column Zhihu LLM Column Zhihu【Boqi】 WeChat Public Account【Boqi Technology Talk】【Boqi Reading】 At the beginning of … Read more

Diffusion-TS: Interpretable Diffusion for General Time Series Generation

Diffusion-TS: Interpretable Diffusion for General Time Series Generation

Click the card above to follow the “Heart of Time Series” public account A wealth of valuable content delivered instantly Diffusion-TS: Interpretable Diffusion for General Time Series Generation Introduction Time series data is ubiquitous in various fields such as finance, healthcare, retail, and climate modeling. However, data sharing can lead to privacy breaches, limiting the … Read more

Transformers as Support Vector Machines: A New Perspective

Transformers as Support Vector Machines: A New Perspective

Click belowCard, follow the “CVer” public account AI/CV heavy content, delivered first time Click to enter—>【Object Detection and Transformer】 group chat Reprinted from: Machine Heart | Edited by: Egg Sauce, Xiao Zhou SVM is all you need, support vector machines never go out of style. Transformer is a support vector machine (SVM), a new theoretical … Read more

Is Transformer Indispensable? Latest Review on State Space Model (SSM)

Is Transformer Indispensable? Latest Review on State Space Model (SSM)

In the era following deep learning, the Transformer architecture has demonstrated its powerful performance in pre-trained large models and various downstream tasks. However, the significant computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been invested in designing more efficient methods. Among these, the … Read more

When Transformer Meets U-Net: A Review of Medical Image Segmentation

When Transformer Meets U-Net: A Review of Medical Image Segmentation

Click the card below to follow the “CVer” WeChat public account AI/CV heavy-duty content delivered to you first Author: Amusi | Source: CVer Introduction There are not many abbreviations left for the combination of Transformer + U-Net… Previously, I reviewed the currently published 5 papers on Transformer + medical image segmentation at MICCAI 2021, see: … Read more

Introducing ∞-former: Infinite Long-Term Memory for Any Length Context

Introducing ∞-former: Infinite Long-Term Memory for Any Length Context

Reported by Machine Heart Machine Heart Editorial Team Can it hold context of any length? Here is a new model called ∞-former. In the past few years, the Transformer has dominated the entire NLP field and has also crossed into other areas such as computer vision. However, it has its weaknesses, such as not being … Read more