Understanding the Mathematical Principles of Transformers

Understanding the Mathematical Principles of Transformers

Author:Fareed Khan Translator: Zhao Jiankai,Proofreader: Zhao Ruxuan The transformer architecture may seem intimidating, and you may have seen various explanations on YouTube or blogs. However, in my blog, I will clarify its principles by providing a comprehensive mathematical example. By doing so, I hope to simplify the understanding of the transformer architecture. Let’s get started! … Read more

Latest Overview of Transformer Models: Essential for NLP Learning

Latest Overview of Transformer Models: Essential for NLP Learning

Reprinted from Quantum Bit Xiao Xiao from Aofeisi Quantum Bit Report | WeChat Official Account QbitAI What are the differences between Longformer, a model capable of efficiently processing long texts, and BigBird, which is considered an “upgraded version” of the Transformer model? What do the various other variants of the Transformer model (X-former) look like, … Read more

Mamba Can Replace Transformer, But They Can Also Be Combined

Mamba Can Replace Transformer, But They Can Also Be Combined

Follow the public account to discover the beauty of CV technology This article is reprinted from Machine Heart, edited by Panda W. Transformers are powerful but not perfect, especially when dealing with long sequences. State Space Models (SSMs) perform quite well on long sequences. Researchers proposed last year that SSMs could replace Transformers, as seen … Read more

Building Instruction-Based Intelligent Agents: Insights from Transformer

Building Instruction-Based Intelligent Agents: Insights from Transformer

Source | The Robot Brains Podcast Translation | Xu Jiayu, Jia Chuan, Yang TingIn 2017, Google released the paper “Attention Is All You Need,” which proposed the Transformer architecture. This has become one of the most influential technological innovations in the field of neural networks over the past decade and has been widely applied in … Read more

Overview of Transformer Pre-trained Models in NLP

Overview of Transformer Pre-trained Models in NLP

The revolution brought by the Transformer in the field of natural language processing (NLP) is beyond words. Recently, researchers from the Indian Institute of Technology and biomedical AI startup Nference.ai conducted a comprehensive investigation of Transformer-based pre-trained models in NLP and compiled the results into a review paper. This article will roughly translate and introduce … Read more

Understanding Vision Transformers with Code

Understanding Vision Transformers with Code

Source: Deep Learning Enthusiasts This article is about 8000 words long and is recommended to be read in 16 minutes. This article will detail the Vision Transformer (ViT) explained in "An Image is Worth 16×16 Words". Since the concept of “Attention is All You Need” was introduced in 2017, Transformer models have quickly emerged in … Read more

ShapeFormer: Shapelet Transformer for Multivariate Time Series Classification

ShapeFormer: Shapelet Transformer for Multivariate Time Series Classification

Source: Time Series Research This article is approximately 3400 words long and is recommended for a 5-minute read. This article introduces the Transformer in multivariate time series classification. Multivariate time series classification (MTSC) has attracted extensive research attention due to its diverse real-world applications. Recently, utilizing Transformers for MTSC has achieved state-of-the-art performance. However, existing … Read more

Why Transformers Are Slowly Replacing CNNs in CV

Why Transformers Are Slowly Replacing CNNs in CV

Author: Pranoy Radhakrishnan Translator: wwl Proofreader: Wang Kehan This article is about 3000 words and is recommended to be read in 10 minutes. This article discusses the application of Transformer models in the field of computer vision and compares them with CNNs. Before understanding Transformers, consider why researchers are interested in studying Transformers when there … Read more

Common Techniques for Accelerating Transformers

Common Techniques for Accelerating Transformers

Source: DeepHub IMBA This article is about 1800 words long, and it is recommended to read in 5 minutes. This article summarizes some commonly used acceleration strategies. Transformers is a powerful architecture, but the model can easily encounter OOM (Out of Memory) issues or hit runtime limits of the GPU during training due to its … Read more

Revisiting Transformer: Inversion More Effective, New SOTA for Real-World Prediction

Revisiting Transformer: Inversion More Effective, New SOTA for Real-World Prediction

The Transformer has shown strong capabilities in time series forecasting, capable of describing pairwise dependencies and extracting multi-level representations from sequences. However, researchers have also questioned the effectiveness of Transformer-based predictors. These predictors often embed multiple variables at the same timestamp into indistinguishable channels and attend to these time tokens to capture temporal dependencies. Considering … Read more