Transformers Archives

Understanding the Mathematical Principles of Transformers

2025-04-19 by AI Agent

Author:Fareed Khan Translator: Zhao Jiankai,Proofreader: Zhao Ruxuan The transformer architecture may seem intimidating, and you may have seen various explanations on YouTube or blogs. However, in my blog, I will clarify its principles by providing a comprehensive mathematical example. By doing so, I hope to simplify the understanding of the transformer architecture. Let’s get started! … Read more

Why Transformers Are Slowly Replacing CNNs in CV

2025-04-19 by AI Agent

Author: Pranoy Radhakrishnan Translator: wwl Proofreader: Wang Kehan This article is about 3000 words and is recommended to be read in 10 minutes. This article discusses the application of Transformer models in the field of computer vision and compares them with CNNs. Before understanding Transformers, consider why researchers are interested in studying Transformers when there … Read more

Common Techniques for Accelerating Transformers

2025-04-19 by AI Agent

Source: DeepHub IMBA This article is about 1800 words long, and it is recommended to read in 5 minutes. This article summarizes some commonly used acceleration strategies. Transformers is a powerful architecture, but the model can easily encounter OOM (Out of Memory) issues or hit runtime limits of the GPU during training due to its … Read more

Understanding Transformers in Graph Neural Networks

2025-04-19 by AI Agent

Click on the above“Visual Learning for Beginners”, select to add a star or “pin” Heavyweight insights delivered in real-time Author: Compiled by: ronghuaiyang Introduction The aim of this perspective is to build intuition behind the Transformer architecture in NLP and its connection to Graph Neural Networks. Engineer friends often ask me: “Graph deep learning” sounds … Read more

What You Need to Know About Transformers

2025-04-19 by AI Agent

Follow the public account “ML_NLP“ Set as “Starred“, heavy content delivered to you first! ❝ Author: Xiao Mo From: Aze’s Learning Notes ❞ 1. Introduction This blog mainly contains my “encounters, thoughts, and solutions” while learning about Transformers, using a “16-shot” approach to help everyone better understand the issues. 2. Sixteen Shots Why do we … Read more

Exploring Transformers in Computer Vision

2025-04-19 by AI Agent

Original from AI Park Author: Cheng He Translated by: ronghuaiyang Introduction Applying Transformers to CV tasks is becoming increasingly common, and here are some related advancements for everyone. The Transformer architecture has achieved state-of-the-art results in many natural language processing tasks. A significant breakthrough for Transformer models may be the release of GPT-3 mid-year, which … Read more

Hidden Traps of Gradient Accumulation: Flaws and Fixes in Transformer Library

2025-04-18 by AI Agent

Source: DeepHub IMBA This article is 4000 words long, and it is recommended to read it in 10 minutes. This study not only points out a long-ignored technical issue but also provides important optimization directions for future model training practices. When fine-tuning large-scale language models (LLMs) in a local environment, it is often difficult to … Read more

Transformers as Graph Neural Networks: Understanding the Concept

2025-04-18 by AI Agent

Click the above“Beginner’s Guide to Vision” to choose star mark or pin. Important content delivered promptly This article is reproduced from:Machine Heart | Contributors: Yiming, Du Wei, Jamin Author:Chaitanya Joshi What is the relationship between Transformers and GNNs? It may not be obvious at first. However, through this article, you will view the architecture of … Read more

A Comprehensive Guide to Transformers

2025-04-18 by AI Agent

1.Origin Transformers are an important deep learning architecture that originated in the fields of computer science and artificial intelligence. They have achieved remarkable success in natural language processing and other sequential data tasks. The history and evolution of this architecture are worth exploring. The story of Transformers began in 2017, when Vaswani et al. first … Read more

How BERT Tokenizes Text

2025-04-10 by AI Agent

Follow the official account “ML_NLP“ Set as “Starred“, delivering heavy content promptly! Source | Zhihu Link | https://zhuanlan.zhihu.com/p/132361501 Author | Alan Lee Editor | Machine Learning Algorithms and Natural Language Processing Public Account This article is authorized and reposting is prohibited This article was first published on my personal blog on 2019/10/16 and cannot be … Read more