The Evolution of Large Models: From Transformer to DeepSeek-R1

📖 Reading Time: 19 minutes 🕙 Release Date: February 14, 2025 ❝ Recent Hot Articles: The Most Comprehensive Mathematical Principles of Neural Networks (Code and Formulas) Intuitive Explanation Welcome to follow the Zhihu and WeChat public account columns LLM Architecture Column Zhihu LLM Column Zhihu【Boqi】 WeChat Public Account【Boqi Technology Talk】【Boqi Reading】 At the beginning of … Read more

BERT and GPT Outperform Transformers Without Attention or MLPs

BERT and GPT Outperform Transformers Without Attention or MLPs

Machine Heart reported Editors: Du Wei, Ze Nan This article explores the Monarch Mixer (M2), a new architecture that is sub-quadratic in both sequence length and model dimension, demonstrating high hardware efficiency on modern accelerators. From language models like BERT, GPT, and Flan-T5 to image models like SAM and Stable Diffusion, Transformers are sweeping the … Read more

Understanding the Working Principle of GPT’s Transformer Technology

Understanding the Working Principle of GPT's Transformer Technology

Introduction The Transformer was proposed in the paper“Attention is All You Need”, and is now the recommended reference model for Google Cloud TPU. By introducing self-attention mechanisms and positional encoding layers, it effectively captures long-distance dependencies in input sequences and performs excellently when handling long sequences. Additionally, the parallel computing capabilities of the Transformer model … Read more

What Is the Transformer Model?

What Is the Transformer Model?

Welcome to the special winter vacation column “High-Tech Lessons for Kids” brought to you by Science Popularization China! Artificial intelligence, as one of the most cutting-edge technologies today, is changing our lives at an astonishing speed. From smart voice assistants to self-driving cars, from AI painting to machine learning, it opens up a future full … Read more

How to Use GPT to Write Long Articles

How to Use GPT to Write Long Articles

///Providing art education, high school entrance exam, college entrance exam, and art graduate school consulting for thousands of students/// Why is GPT not able to write long articles well? Despite using GPT for a long time, many people still struggle to use it for writing long texts. Vague questioning instructions and the tendency to specify … Read more

What Is the Transformer Model?

What Is the Transformer Model?

Welcome to the special winter vacation column “High-Tech Lessons for Kids” presented by Science Popularization China! Artificial intelligence, as one of the most cutting-edge technologies today, is rapidly changing our lives at an astonishing pace. From smart voice assistants to self-driving cars, from AI painting to machine learning, it opens up a future full of … Read more

AI Capabilities as Core Competencies

AI Capabilities as Core Competencies

Humanity has entered a new era of comprehensive intelligence, where artificial intelligence (AI) represents a historic opportunity. For institutions, enterprises, and individuals across society, the ability to utilize AI has become a core competency. 1. The Revolutionary Significance of Large Models The breakthrough growth of generative artificial intelligence (AIGC), particularly the explosion of large models … Read more

Must-See! Princeton’s Chen Danqi Latest Course on Understanding Large Language Models 2022!

Must-See! Princeton's Chen Danqi Latest Course on Understanding Large Language Models 2022!

MLNLP community is a well-known machine learning and natural language processing community both domestically and internationally, covering NLP graduate students, teachers from universities, and researchers from enterprises. The vision of the community is to promote communication and progress between the academic and industrial circles of natural language processing and machine learning, especially for the progress … Read more

Educational Applications of Large Language Models: Principles, Status, and Challenges

Educational Applications of Large Language Models: Principles, Status, and Challenges

Abstract: Large Language Models (LLMs) are natural language processing technologies used to describe vast amounts of text through vector representations and generative probabilities. Recently, with the emergence of representative products like ChatGPT, which has garnered widespread attention in the education sector due to its excellent capabilities in generation, comprehension, logical reasoning, and dialogue, research on … Read more

Post-BERT: Pre-trained Language Models and Natural Language Generation

Post-BERT: Pre-trained Language Models and Natural Language Generation

Wishing You a Prosperous Year of the Rat HAPPY 2020’S NEW YEAR Author:Tea Book Club of Lao Song Zhihu Column:NLP and Deep Learning Research Direction:Natural Language Processing Source:AINLP Introduction BERT has achieved great success in the field of natural language understanding, but it performs poorly in natural language generation due to the language model used … Read more