Natural Language Processing Archives - Page 13 of 38

Understanding the Mathematical Principles of Large Models

2025-05-08 by AI Agent

Participants of the 1956 Dartmouth Conference. Left 2: Rochester, Left 3: Solomonoff, Left 4: Minsky, Right 2: McCarthy, Right 1: Shannon Introduction: The secret behind the success of OpenAI’s popular GPT series lies in next token prediction (essentially: predicting the next word), which is mathematically grounded in Solomonoff’s Induction. This method is the theoretical cornerstone … Read more

Academician Zhang Bo: Three Major Capabilities and One Major Flaw of Large Models

2025-05-08 by AI Agent

Introduction What positive impacts will new technologies bring to industries? How will they smoothly land in various scenarios? Recently, at the “2024 Global Business Innovation Conference” hosted by UFIDA, Academician Zhang Bo, an academician of the Chinese Academy of Sciences and honorary director of the Tsinghua University Institute of Artificial Intelligence, delivered a speech titled … Read more

An Overview of AI Industry Large Models

2025-05-07 by AI Agent

When developing or incubating large models, do not insist on absolute short-term financial metrics, but rather focus on relative improvements in business and technical indicators. By | Tencent Research Institute Large Model Research Group The technology of general large models is developing rapidly, but many traditional industries are not advancing as quickly. For enterprises, the … Read more

What Are Large Models?

2025-05-07 by AI Agent

Large models refer to machine learning models with a large number of parameters and complex computational structures. This article starts from the basic concept of large models, distinguishes related concepts that are easily confused in the field of large models, and provides a detailed interpretation of the development history of large models, serving as a … Read more

Detailed Explanation of Masks in Attention Mechanisms

2025-05-07 by AI Agent

来源：DeepHub IMBA This article is approximately 1800 words long and is recommended to be read in 5 minutes. This article will provide a detailed introduction to the principles and mechanisms of the masks in attention mechanisms. The attention mechanism mask allows us to send batches of data of varying lengths into the transformer at once. … Read more

Comprehensive Guide to Seq2Seq Attention Model

2025-05-07 by AI Agent

Follow us on WeChat: ML_NLP. Set as a “Starred” account for heavy content delivered to you first! Source: | Zhihu Link: | https://zhuanlan.zhihu.com/p/40920384 Author: | Yuanche.Sh Editor: | Machine Learning Algorithms and Natural Language Processing WeChat account This article is for academic sharing only. If there is any infringement, please contact us to delete it. … Read more

Latest Overview of Attention Mechanism Models

2025-05-06 by AI Agent

Source:Zhuanzhi This article is a multi-resource, recommended reading in 5 minutes. This article details the Attention model‘s concept, definition, impact, and how to get started with practical work. [Introduction]The Attention model has become an important concept in neural networks, and this article brings you the latest overview of this model, detailing its concept, definition, impact, … Read more

Understanding the Details of Transformers: 18 Key Questions

2025-05-06 by AI Agent

Source: Artificial Intelligence Research This article is approximately 5400 words long and is recommended for a reading time of over 10 minutes. This article will help you understand Transformers from all aspects through a Q&A format. Source: Zhihu Author: Wang Chen, who asks questions @ Zhihu Why summarize Transformers through eighteen questions? There are two … Read more

What Are the Details of Transformers? 18 Questions About Transformers!

2025-05-06 by AI Agent

Source: https://www.zhihu.com/question/362131975/answer/3058958207 Author: Wang Chen, who asks questions @ Zhihu (Authorized) Editor: Jishi Platform Why summarize Transformers through eighteen questions? There are two reasons: First, Transformer is the fourth major feature extractor after MLP, RNN, and CNN, also known as the fourth foundational model; the recently popular chatGPT is also based on Transformer, highlighting its … Read more

Do Long-Context Models Truly Leverage Context for Responses?

2025-05-06 by AI Agent

MLNLP community is a well-known machine learning and natural language processing community in China and abroad, covering NLP master’s and doctoral students, university teachers, and enterprise researchers. The Vision of the Community is to promote communication and progress between the academic and industrial circles of natural language processing and machine learning, especially for beginners. Reprinted … Read more