Transformers and Their Variants in NLP

Transformers and Their Variants in NLP

Follow the WeChat public account “ML_NLP“ Set it as “Starred“, delivering heavy content directly to you! Author: Jiang Runyu, Harbin Institute of Technology SCIR Introduction In recent years, the most impressive achievement in the field of NLP is undoubtedly the pre-trained models represented by Google’s BERT. They continuously break records (both in task metrics and … Read more

Layer-by-Layer Function Introduction and Detailed Explanation of Transformer Architecture

Layer-by-Layer Function Introduction and Detailed Explanation of Transformer Architecture

Source: Deephub Imba This article has a total of 2700 words, recommended reading time is 5 minutes. This article will give you an understanding of the overall architecture of the Transformer. For many years, deep learning has been continuously evolving. Deep learning practice emphasizes the use of a large number of parameters to extract useful … Read more

Understanding Transformer Principles and Implementation in 10 Minutes

Understanding Transformer Principles and Implementation in 10 Minutes

Follow the public account “ML_NLP“ Set as “Starred“, heavy content delivered first-hand! Source | Zhihu Address | https://zhuanlan.zhihu.com/p/80986272 Author | Chen Chen Editor | Machine Learning Algorithms and Natural Language Processing Public Account This article is for academic sharing only. If there is any infringement, please contact us to delete the article. The model built … Read more

Understanding the Mathematical Principles of Transformers

Understanding the Mathematical Principles of Transformers

Author:Fareed Khan Translator: Zhao Jiankai,Proofreader: Zhao Ruxuan The transformer architecture may seem intimidating, and you may have seen various explanations on YouTube or blogs. However, in my blog, I will clarify its principles by providing a comprehensive mathematical example. By doing so, I hope to simplify the understanding of the transformer architecture. Let’s get started! … Read more

Understanding Vision Transformers with Code

Understanding Vision Transformers with Code

Source: Deep Learning Enthusiasts This article is about 8000 words long and is recommended to be read in 16 minutes. This article will detail the Vision Transformer (ViT) explained in "An Image is Worth 16×16 Words". Since the concept of “Attention is All You Need” was introduced in 2017, Transformer models have quickly emerged in … Read more

Introduction to Transformer Models

Introduction to Transformer Models

Madio.net Mathematics China ///Editor: Only Tulips’ Garden The essence of the Transformer is an Encoder-Decoder structure, as shown in the figure: Before the advent of transformers, most sequence-to-sequence models (Encoder-Decoder) were based on CNNs and RNNs. In this article, we have already introduced the Attention and Self-attention mechanisms, and the Transformer is based on the … Read more

Understanding Transformers in Graph Neural Networks

Understanding Transformers in Graph Neural Networks

Click on the above“Visual Learning for Beginners”, select to add a star or “pin” Heavyweight insights delivered in real-time Author: Compiled by: ronghuaiyang Introduction The aim of this perspective is to build intuition behind the Transformer architecture in NLP and its connection to Graph Neural Networks. Engineer friends often ask me: “Graph deep learning” sounds … Read more

Illustrated Guide to Transformers

Illustrated Guide to Transformers

Step 1 — Define the Dataset For demonstration purposes, the dataset here contains only three English sentences, using a very small dataset to intuitively perform numerical calculations. In real applications, larger datasets are used to train neural network models, such as ChatGPT, which was trained on data amounting to 570 GB. Our entire dataset contains … Read more

Time Series + Transformer: Understanding iTransformer

Time Series + Transformer: Understanding iTransformer

This article is about 3500 words long and is recommended to be read in 10 minutes. This article will help you understand iTransformer and better utilize the attention mechanism for multivariate correlation. 1 Introduction Transformers perform excellently in natural language processing and computer vision, but they do not perform as well as linear models in … Read more

Understanding Transformers: A Simplified Guide

Understanding Transformers: A Simplified Guide

Source: Python Data Science This article is approximately 7200 words long and is recommended to be read in 14 minutes. In this article, we will explore the Transformer model and understand how it works. 1. Introduction The BERT model launched by Google achieved SOTA results in 11 NLP tasks, igniting the entire NLP community. One … Read more