Unlocking the Secrets of Self-Attention Extrapolation Defects: Ant Group’s New Transformer

MLNLP community is a well-known machine learning and natural language processing community in China and abroad, covering NLP graduate students, university teachers, and corporate researchers.The vision of the community is to promote communication and progress between the academic and industrial circles of natural language processing and machine learning, especially for beginners. Reprinted from | Machine … Read more

Thoughts on Upgrading Transformer: Simple Considerations on Multimodal Encoding Positions

Thoughts on Upgrading Transformer: Simple Considerations on Multimodal Encoding Positions

©PaperWeekly Original · Author | Su Jianlin Affiliation | Scientific Space Research Direction | NLP, Neural Networks In the second article of this series, “The Path of Transformer Upgrade: A Rotational Position Encoding that Draws on the Strengths of Many,” the author proposes Rotational Position Encoding (RoPE) — a method to achieve relative position encoding … Read more

Discussion on Absolute, Relative, and Rotational Position Encoding in Transformers

Discussion on Absolute, Relative, and Rotational Position Encoding in Transformers

Click the card below to follow the “AI Frontier Express” public account Various important resources delivered promptly Reprinted from Zhihu: Yao Yuan Link: https://zhuanlan.zhihu.com/p/17311602488 1. Introduction The attention mechanism in Transformer [1] can effectively model the correlations between tokens, achieving significant performance improvements in many tasks. However, the attention mechanism itself does not have the … Read more