Introducing ∞-former: Infinite Long-Term Memory for Any Length Context

Introducing ∞-former: Infinite Long-Term Memory for Any Length Context

Reported by Machine Heart Machine Heart Editorial Team Can it hold context of any length? Here is a new model called ∞-former. In the past few years, the Transformer has dominated the entire NLP field and has also crossed into other areas such as computer vision. However, it has its weaknesses, such as not being … Read more

Thoughts on Upgrading Transformer: Simple Considerations on Multimodal Encoding Positions

Thoughts on Upgrading Transformer: Simple Considerations on Multimodal Encoding Positions

©PaperWeekly Original · Author | Su Jianlin Affiliation | Scientific Space Research Direction | NLP, Neural Networks In the second article of this series, “The Path of Transformer Upgrade: A Rotational Position Encoding that Draws on the Strengths of Many,” the author proposes Rotational Position Encoding (RoPE) — a method to achieve relative position encoding … Read more

Changes in Transformer Architecture Since 2017

Changes in Transformer Architecture Since 2017

Reading articles about LLMs, you often see phrases like “we use the standard Transformer architecture.” But what does “standard” mean, and has it changed since the original paper? Interestingly, despite the rapid growth in the NLP field over the past five years, the Vanilla Transformer still adheres to the Lindy Effect, which suggests that the … Read more

Cross-Domain Models: Using Transformers for Object Detection

Cross-Domain Models: Using Transformers for Object Detection

Report by Machine Heart Contributors: Racoon, Du Wei, Zhang Qian Since its introduction in 2017, the Transformer has swept the entire NLP field, with popular models like BERT and GPT-2 adopting Transformer-based architectures.Since it is so effective, why not apply it to CV?Recently, researchers at Facebook AI have attempted this by applying Transformers to object … Read more

BERT and GPT Outperform Transformers Without Attention or MLPs

BERT and GPT Outperform Transformers Without Attention or MLPs

Machine Heart reported Editors: Du Wei, Ze Nan This article explores the Monarch Mixer (M2), a new architecture that is sub-quadratic in both sequence length and model dimension, demonstrating high hardware efficiency on modern accelerators. From language models like BERT, GPT, and Flan-T5 to image models like SAM and Stable Diffusion, Transformers are sweeping the … Read more

Applications of Transformer in Quantitative Investment

Applications of Transformer in Quantitative Investment

The WeChat public account on quantitative investment and machine learning is a mainstream self-media in the industry, focusing on fields such as quantitative investment, hedge funds, Fintech, artificial intelligence, and big data. The public account has over 300K+ followers from industries such as public offerings, private equity, securities, futures, banks, insurance, and universities, and has … Read more

Fourier Transform Replaces Transformer Self-Attention Layer

Fourier Transform Replaces Transformer Self-Attention Layer

Machine Heart reports Machine Heart Editorial Team The research team from Google indicates that replacing the transformer self-attention layer with Fourier Transform can achieve 92% accuracy on the GLUE benchmark, with training times 7 times faster on GPU and 2 times faster on TPU. Since its introduction in 2017, the Transformer architecture has dominated the … Read more

Practical Experience in Transformer Quantization Deployment Based on Journey 5 Chip

Practical Experience in Transformer Quantization Deployment Based on Journey 5 Chip

Introduction: On March 28, the 16th lecture of the “New Youth in Autonomous Driving” organized by Zhixingshi successfully concluded. In this lecture, Yang Zhigang, the core developer of the Horizon toolchain, conducted a live explanation on the topic of “Practical Experience in Transformer Quantization Deployment Based on Journey 5 Chip.” Yang Zhigang first introduced the … Read more

Mamba Architecture Expanded: Hybrid Transformer Triumphs

Mamba Architecture Expanded: Hybrid Transformer Triumphs

This article is authorized for reprint by AI New Media Quantum Bit (Public Account ID: qbitai). Please contact the source for reprinting. This article is approximately 1200 words long and is recommended for a 5-minute read. This article introduces the hybrid model Jamba. Exciting news! The first real expansion of the Mamba architecture has finally … Read more

Understanding the Working Principle of GPT’s Transformer Technology

Understanding the Working Principle of GPT's Transformer Technology

Introduction The Transformer was proposed in the paper“Attention is All You Need”, and is now the recommended reference model for Google Cloud TPU. By introducing self-attention mechanisms and positional encoding layers, it effectively captures long-distance dependencies in input sequences and performs excellently when handling long sequences. Additionally, the parallel computing capabilities of the Transformer model … Read more