Latest Overview of Transformer Models: Essential for NLP Learning

Latest Overview of Transformer Models: Essential for NLP Learning

Reprinted from Quantum Bit Xiao Xiao from Aofeisi Quantum Bit Report | WeChat Official Account QbitAI What are the differences between Longformer, a model capable of efficiently processing long texts, and BigBird, which is considered an “upgraded version” of the Transformer model? What do the various other variants of the Transformer model (X-former) look like, … Read more

Mamba Can Replace Transformer, But They Can Also Be Combined

Mamba Can Replace Transformer, But They Can Also Be Combined

Follow the public account to discover the beauty of CV technology This article is reprinted from Machine Heart, edited by Panda W. Transformers are powerful but not perfect, especially when dealing with long sequences. State Space Models (SSMs) perform quite well on long sequences. Researchers proposed last year that SSMs could replace Transformers, as seen … Read more

Building Instruction-Based Intelligent Agents: Insights from Transformer

Building Instruction-Based Intelligent Agents: Insights from Transformer

Source | The Robot Brains Podcast Translation | Xu Jiayu, Jia Chuan, Yang TingIn 2017, Google released the paper “Attention Is All You Need,” which proposed the Transformer architecture. This has become one of the most influential technological innovations in the field of neural networks over the past decade and has been widely applied in … Read more

Overview of Transformer Pre-trained Models in NLP

Overview of Transformer Pre-trained Models in NLP

The revolution brought by the Transformer in the field of natural language processing (NLP) is beyond words. Recently, researchers from the Indian Institute of Technology and biomedical AI startup Nference.ai conducted a comprehensive investigation of Transformer-based pre-trained models in NLP and compiled the results into a review paper. This article will roughly translate and introduce … Read more

ShapeFormer: Shapelet Transformer for Multivariate Time Series Classification

ShapeFormer: Shapelet Transformer for Multivariate Time Series Classification

Source: Time Series Research This article is approximately 3400 words long and is recommended for a 5-minute read. This article introduces the Transformer in multivariate time series classification. Multivariate time series classification (MTSC) has attracted extensive research attention due to its diverse real-world applications. Recently, utilizing Transformers for MTSC has achieved state-of-the-art performance. However, existing … Read more

Revisiting Transformer: Inversion More Effective, New SOTA for Real-World Prediction

Revisiting Transformer: Inversion More Effective, New SOTA for Real-World Prediction

The Transformer has shown strong capabilities in time series forecasting, capable of describing pairwise dependencies and extracting multi-level representations from sequences. However, researchers have also questioned the effectiveness of Transformer-based predictors. These predictors often embed multiple variables at the same timestamp into indistinguishable channels and attend to these time tokens to capture temporal dependencies. Considering … Read more

Introduction to Transformer Models

Introduction to Transformer Models

Madio.net Mathematics China ///Editor: Only Tulips’ Garden The essence of the Transformer is an Encoder-Decoder structure, as shown in the figure: Before the advent of transformers, most sequence-to-sequence models (Encoder-Decoder) were based on CNNs and RNNs. In this article, we have already introduced the Attention and Self-attention mechanisms, and the Transformer is based on the … Read more

ViTGAN: A New Approach to Image Generation Using Transformers

ViTGAN: A New Approach to Image Generation Using Transformers

Transformers have brought tremendous advancements to various natural language tasks and have recently begun to penetrate the field of computer vision, starting to show potential in tasks previously dominated by CNNs. A recent study from the University of California, San Diego, and Google Research proposed using visual Transformers to train GANs. To effectively apply this … Read more

Illustrated Guide to Transformers

Illustrated Guide to Transformers

Step 1 — Define the Dataset For demonstration purposes, the dataset here contains only three English sentences, using a very small dataset to intuitively perform numerical calculations. In real applications, larger datasets are used to train neural network models, such as ChatGPT, which was trained on data amounting to 570 GB. Our entire dataset contains … Read more

Time Series + Transformer: Understanding iTransformer

Time Series + Transformer: Understanding iTransformer

This article is about 3500 words long and is recommended to be read in 10 minutes. This article will help you understand iTransformer and better utilize the attention mechanism for multivariate correlation. 1 Introduction Transformers perform excellently in natural language processing and computer vision, but they do not perform as well as linear models in … Read more