Transformer Archives

Attention Mechanism Bug: Softmax as the Culprit Affecting All Transformers

2025-04-02 by AI Agent

“I found a bug in the attention formula, and no one has noticed it for eight years. All Transformer models, including GPT and LLaMA, are affected.” Recently, a statistical engineer named Evan Miller has stirred up a storm in the AI field with his statement. We know that the attention formula in machine learning is … Read more

Unlocking Model Performance with Attention Mechanism

2025-04-02 by AI Agent

The author of this article – Teacher Tom ▷ Doctorate from a double first-class domestic university, national key laboratory ▷ Published 12 papers at top international conferences, obtained 2 national invention patents, served as a reviewer for multiple international journals ▷ Guided more than ten doctoral and master’s students Research Areas: General visual-language cross-modal model … Read more

RAG Mastery Manual: Understanding the Technology Behind RAG

2025-03-28 by AI Agent

In a previous article titled RAG Mastery Manual: Is RAG Sounding the Death Knell? Does Long Context in Large Models Mean Vector Retrieval is No Longer Important, we introduced the indispensability of RAG in solving the hallucination problem of large models, and reviewed how to enhance the practical effects of RAG using vector databases. Today, … Read more

Anker’s Yang Meng Discusses GPU, Transformers, and the Future of Robotics

2025-03-27 by AI Agent

How does Anker, which focuses on robotics, view the future of large models and general-purpose robots? Anker is often seen by the Chinese public as a power bank company, but in fact, power bank revenue accounts for less than 10% of their income. In 2022, Anker generated $2 billion in revenue, achieving top-tier status in … Read more

Unlocking Effective Combination of CNN and Transformer: ByteDance Proposes Next-Gen Visual Transformer

2025-03-27 by AI Agent

Reported by Machine Heart Machine Heart Editorial Department Researchers from ByteDance have proposed a next-generation visual Transformer, Next-ViT, which can be effectively deployed in real industrial scenarios. Next-ViT can infer quickly like a CNN while maintaining the powerful performance of a ViT. Due to the complex attention mechanisms and model designs, most existing visual Transformers … Read more

Understanding Transformers: A Comprehensive Guide

2025-03-27 by AI Agent

↓Recommended Follow↓ Transformers have fundamentally changed deep learning models since their introduction. Today, we will unveil the core concepts behind Transformers: the attention mechanism, encoder-decoder architecture, multi-head attention, and more. Through Python code snippets, you’ll gain a deeper understanding of its principles. 1. Understanding the Attention Mechanism The attention mechanism is a fascinating concept in … Read more

CNN + Transformer = SOTA! Global Information Recovered by Transformer

2025-03-27 by AI Agent

New Intelligence Report Source: Microsoft Editor: LRS, Xiao Yun [New Intelligence Guide] Microsoft has published a new paper on arxiv, bringing CNN into Transformer to simultaneously consider global and local information. In the development of computer vision technology, the most important model is the Convolutional Neural Network (CNN), which serves as the foundation for other … Read more

New Architecture Surpasses Transformer? CMU and Princeton Launch with 5x Inference Speed Boost and Performance Optimization

2025-03-27 by AI Agent

Big Data Digest Authorized Reprint from Leading Technology Author丨Congerry Transformer Challenged! In June 2017, eight Google researchers published a groundbreaking paper titled “Attention is All You Need”. It is called groundbreaking because this paper proposed a new neural network architecture – the Transformer, which opened a new era of generative artificial intelligence and large models. … Read more

Mamba Architecture Expanded: Hybrid Transformer Defeats Transformer

2025-03-27 by AI Agent

Feng Se from Aofeisi Quantum Bit | Public Account QbitAI Exciting news! The first project to truly scale the popular Mamba architecture to a sufficiently large size has arrived. 52 billion parameters, still using the Mamba+Transformer hybrid architecture. Its name is Jamba. By taking the strengths of both architectures, it achieves both model quality and … Read more

Mamba Evolution Disrupts Transformer: A100 Achieves 140K Context

2025-03-27 by AI Agent

New Intelligence Report Editor: Editorial Department [New Intelligence Guide] The production-grade Mamba model with 52B parameters is here! This powerful variant, Jamba, has just broken the world record, capable of directly competing with Transformers, featuring a 256K ultra-long context window and a threefold throughput increase, with weights available for free download. The Mamba architecture, which … Read more