Transformers Mimic Brain Functionality and Outperform 42 Models

Transformers Mimic Brain Functionality and Outperform 42 Models

Follow our official account to discover the beauty of CV technology This article is reprinted from Quantum Bit. Pine from Aofeisi Quantum Bit | Official Account QbitAI Many AI application models today cannot avoid mentioning one model structure: Transformer. It abandons traditional CNN and RNN structures, consisting entirely of the Attention mechanism. Transformers not only … Read more

Understanding the Mathematical Principles of Transformers

Understanding the Mathematical Principles of Transformers

Author:Fareed Khan Translator: Zhao Jiankai,Proofreader: Zhao Ruxuan The transformer architecture may seem intimidating, and you may have seen various explanations on YouTube or blogs. However, in my blog, I will clarify its principles by providing a comprehensive mathematical example. By doing so, I hope to simplify the understanding of the transformer architecture. Let’s get started! … Read more

Mamba Can Replace Transformer, But They Can Also Be Combined

Mamba Can Replace Transformer, But They Can Also Be Combined

Follow the public account to discover the beauty of CV technology This article is reprinted from Machine Heart, edited by Panda W. Transformers are powerful but not perfect, especially when dealing with long sequences. State Space Models (SSMs) perform quite well on long sequences. Researchers proposed last year that SSMs could replace Transformers, as seen … Read more

Understanding Vision Transformers with Code

Understanding Vision Transformers with Code

Source: Deep Learning Enthusiasts This article is about 8000 words long and is recommended to be read in 16 minutes. This article will detail the Vision Transformer (ViT) explained in "An Image is Worth 16×16 Words". Since the concept of “Attention is All You Need” was introduced in 2017, Transformer models have quickly emerged in … Read more

Illustrated Guide to Transformers

Illustrated Guide to Transformers

Step 1 — Define the Dataset For demonstration purposes, the dataset here contains only three English sentences, using a very small dataset to intuitively perform numerical calculations. In real applications, larger datasets are used to train neural network models, such as ChatGPT, which was trained on data amounting to 570 GB. Our entire dataset contains … Read more

What You Need to Know About Transformers

What You Need to Know About Transformers

Follow the public account “ML_NLP“ Set as “Starred“, heavy content delivered to you first! ❝ Author: Xiao Mo From: Aze’s Learning Notes ❞ 1. Introduction This blog mainly contains my “encounters, thoughts, and solutions” while learning about Transformers, using a “16-shot” approach to help everyone better understand the issues. 2. Sixteen Shots Why do we … Read more

Recent Advances in Graph Transformer Research

Recent Advances in Graph Transformer Research

Source: Algorithm Advancement This article is approximately 4500 words long and is recommended for a 9-minute read. This article introduces the Graph Transformer, a novel and powerful neural network model capable of effectively encoding and processing graph-structured data. Graph neural networks (GNNs) and Transformers represent a recent advancement in machine learning, providing a new type … Read more

Understanding Vision Transformers in Deep Learning

Understanding Vision Transformers in Deep Learning

Since the concept of “Attention is All You Need” was introduced in 2017, the Transformer model has quickly emerged in the field of Natural Language Processing (NLP), establishing its leading position. By 2021, the idea that “one image is equivalent to 16×16 words” successfully brought the Transformer model into computer vision tasks. Since then, numerous … Read more

Understanding Transformer Principles and Implementation in 10 Minutes

Understanding Transformer Principles and Implementation in 10 Minutes

Click the above “Visual Learning for Beginners” to select “Star” or “Pin” Important content delivered at the first time This article is adapted from | Deep Learning This Little Thing Models based on Transformer from the paper “Attention Is All You Need” (such as Bert) have achieved revolutionary results in various natural language processing tasks … Read more

Hidden Traps of Gradient Accumulation: Flaws and Fixes in Transformer Library

Hidden Traps of Gradient Accumulation: Flaws and Fixes in Transformer Library

Source: DeepHub IMBA This article is 4000 words long, and it is recommended to read it in 10 minutes. This study not only points out a long-ignored technical issue but also provides important optimization directions for future model training practices. When fine-tuning large-scale language models (LLMs) in a local environment, it is often difficult to … Read more