GPT-2 Archives - StatedAI

Visualizing Transformers: An Interactive Tool for Understanding

2025-07-20 by AI Agent

It’s 2024, and there are still people who don’t understand how Transformers work? Come try this interactive tool. In 2017, Google introduced the Transformer in the paper “Attention is All You Need,” marking a major breakthrough in deep learning. The paper has been cited nearly 130,000 times, and all models in the subsequent GPT family … Read more

Detailed Explanation of Masks in Attention Mechanisms

2025-05-07 by AI Agent

来源：DeepHub IMBA This article is approximately 1800 words long and is recommended to be read in 5 minutes. This article will provide a detailed introduction to the principles and mechanisms of the masks in attention mechanisms. The attention mechanism mask allows us to send batches of data of varying lengths into the transformer at once. … Read more

Can Transformers Plan for Future Tokens?

2025-04-18 by AI Agent

Do language models plan for future tokens? This paper provides the answer. “Don’t let Yann LeCun see this.” Yann LeCun said it’s too late; he has already seen it. Today, we will introduce a paper that “LeCun must see,” exploring the question: Is the Transformer a far-sighted language model? When it performs inference at a … Read more

Finally, Someone Visualized the Transformer!

2025-04-18 by AI Agent

Is there anyone who still doesn’t understand how the Transformer works in 2024?Come and try this interactive tool. In 2017, Google introduced the Transformer in the paper “Attention is All You Need,” which became a major breakthrough in deep learning. The paper has been cited nearly 130,000 times, and all models in the subsequent GPT … Read more

Complete Illustrated Guide to GPT-2: Just Read This Article (Part Two)

2025-04-16 by AI Agent

Follow the public account “ML_NLP“ Set as “Starred“, delivering heavy content directly to you! Source | Zhihu Address | https://zhuanlan.zhihu.com/p/79872507 Author | Machine Heart Editor | Machine Learning Algorithms and Natural Language Processing Public Account This article is for academic sharing only, if there is any infringement, please contact us to delete it. In the … Read more

How to Use BERT and GPT-2 in Your Models

2025-04-10 by AI Agent

Recommended by New Intelligence Source: Zhuanzhi (ID: Quan_Zhuanzhi) Editor: Sanshi [New Intelligence Guide] In the field of NLP, various advanced tools have emerged recently. However, practice is the key, and how to apply them to your own models is crucial. This article introduces this issue. Recently in NLP, various pre-trained language models like ELMO, GPT, … Read more

Beyond ReLU: The GELU Activation Function in BERT and GPT-2

2025-03-25 by AI Agent

Reported by Machine Heart Machine Heart Editorial Team At least in the field of NLP, GELU has become the choice of many industry-leading models. As the “switch” that determines whether a neural network transmits information, the activation function is crucial for neural networks. However, is the ReLU commonly used today really the most efficient method? … Read more