Alibaba’s Tora: A Trajectory-Controlled DiT Video Generation Model

Follow our official account to discover the beauty of CV technology This paper shares Tora: Trajectory-oriented Diffusion Transformer for Video Generation, where Alibaba proposes the trajectory-controlled DiT video generation model Tora. Paper link: https://arxiv.org/abs/2407.21705 Project link: https://ali-videoai.github.io/tora_video/ Background Video generation models have recently made significant progress. For example, OpenAI’s Sora and domestic models like Vidu … Read more

Detailed Module Analysis of DETR Structure

Detailed Module Analysis of DETR Structure

Transformers shine in the field of computer vision, and the Detection Transformer (DETR) is a successful application of Transformers in object detection. By utilizing the attention mechanism in Transformers, it effectively models long-range dependencies in images, simplifying the object detection pipeline and constructing an end-to-end object detector. Object detection can be understood as a set … Read more

Detailed Explanation of ViT Model and PyTorch Implementation

Detailed Explanation of ViT Model and PyTorch Implementation

Introduction Using PyTorch to implement the ViT model from scratch, training the ViT model on the CIFAR-10 dataset for image classification. Architecture of ViT The architecture of ViT is inspired by BERT, which is a transformer model that uses only encoders, typically used for supervised learning tasks in NLP such as text classification or named … Read more

A Detailed Guide to Self-Attention Mechanism

A Detailed Guide to Self-Attention Mechanism

Author: Greatness Comes from Perseverance @ Zhihu (Authorized) Source: https://zhuanlan.zhihu.com/p/410776234 Self-Attention is the core idea of Transformer. Recently, I re-read the paper and gained some new insights. Thus, I wrote this article to share my thoughts with readers. When I first encountered Self-Attention, the most confusing part for me was the three matrices Q, K, … Read more

Understanding BERT: The Essence, Principles, and Applications of BERT

Understanding BERT: The Essence, Principles, and Applications of BERT

This article will coverthe essence of BERT, the principles of BERT, and the applications of BERTBidirectional Encoder Representations from Transformers | BERT. Google BERT 1. the essence of BERT BERT Architecture: A pre-trained language model based on a multi-layer Transformer encoder that captures the bidirectional context of text through Tokenization, various Embeddings, and task-specific output … Read more

Understanding Attention Mechanism in Neural Networks

This article will coverthe essence of Attention,the principle of Attention and its applications to help you understand the Attention mechanism. 1.The Essence of Attention The core logic: from focusing on everything to focusing on key points The Attention mechanism can grasp the key points when processing long texts, without losing important information. The Attention mechanism … Read more

Understanding Self-Attention and Multi-Head Attention in Neural Networks

With the rapid popularity of the Transformer model, Self-Attention and Multi-Head Attention have become core components in the field of Natural Language Processing (NLP).This article will analyze these two attention mechanisms from three aspects: brief introduction, workflow, and comparison. 1. Brief Introduction Self-Attention: Allows each element in the input sequence to focus on and weight … Read more

The Evolution of Large Models: From Transformer to DeepSeek-R1

📖 Reading Time: 19 minutes 🕙 Release Date: February 14, 2025 ❝ Recent Hot Articles: The Most Comprehensive Mathematical Principles of Neural Networks (Code and Formulas) Intuitive Explanation Welcome to follow the Zhihu and WeChat public account columns LLM Architecture Column Zhihu LLM Column Zhihu【Boqi】 WeChat Public Account【Boqi Technology Talk】【Boqi Reading】 At the beginning of … Read more

Diffusion-TS: Interpretable Diffusion for General Time Series Generation

Diffusion-TS: Interpretable Diffusion for General Time Series Generation

Click the card above to follow the “Heart of Time Series” public account A wealth of valuable content delivered instantly Diffusion-TS: Interpretable Diffusion for General Time Series Generation Introduction Time series data is ubiquitous in various fields such as finance, healthcare, retail, and climate modeling. However, data sharing can lead to privacy breaches, limiting the … Read more

Transformers as Support Vector Machines: A New Perspective

Transformers as Support Vector Machines: A New Perspective

Click belowCard, follow the “CVer” public account AI/CV heavy content, delivered first time Click to enter—>【Object Detection and Transformer】 group chat Reprinted from: Machine Heart | Edited by: Egg Sauce, Xiao Zhou SVM is all you need, support vector machines never go out of style. Transformer is a support vector machine (SVM), a new theoretical … Read more