Finally, Someone Visualized the Transformer!

Is there anyone who still doesn’t understand how the Transformer works in 2024?Come and try this interactive tool.
In 2017, Google introduced the Transformer in the paper “Attention is All You Need,” which became a major breakthrough in deep learning. The paper has been cited nearly 130,000 times, and all models in the subsequent GPT family are based on the Transformer architecture, indicating its wide impact.
As a neural network architecture, the Transformer is popular for a variety of tasks from text to vision, especially in the currently booming AI chatbot field.

Finally, Someone Visualized the Transformer!

However, for many non-experts, the internal workings of the Transformer remain opaque, hindering their understanding and participation. Therefore, it is especially necessary to unveil the mystery of this architecture. Many blogs, video tutorials, and 3D visualizations often emphasize the mathematical complexity and model implementation, which may leave beginners at a loss. Meanwhile, visualizations designed for AI practitioners focus on neuron and layer interpretability, which can be challenging for non-experts.
Thus, several researchers from Georgia Tech and IBM Research developed a web-based open-source interactive visualization tool called “Transformer Explainer” to help non-experts understand the high-level model structure and low-level mathematical operations of the Transformer. As shown in Figure 1 below.

Finally, Someone Visualized the Transformer!

The Transformer Explainer explains the internal workings of the Transformer through text generation, using Sankey diagram visualization design, inspired by recent work that views the Transformer as a dynamic system, emphasizing how input data flows through model components. The results show that the Sankey diagram effectively illustrates how information is transmitted within the model and demonstrates how the input is processed and transformed through Transformer operations.
In terms of content, the Transformer Explainer tightly integrates a model overview summarizing the structure of the Transformer and allows users to smoothly transition between multiple levels of abstraction to visualize the interaction between low-level mathematical operations and high-level model structures, helping them understand the complex concepts within the Transformer.
Functionally, the Transformer Explainer not only provides a web-based implementation but also features real-time inference capabilities. Unlike many existing tools that require custom software installation or lack inference functionality, it integrates a real-time GPT-2 model, running in the browser using modern frontend frameworks. Users can interactively experiment with their input text and observe in real-time how the internal components and parameters of the Transformer work together to predict the next token.
In terms of significance, the Transformer Explainer expands access to modern generative AI technology without requiring advanced computational resources, installation, or programming skills. The choice of GPT-2 is due to its high recognition, fast inference speed, and similarity in architecture to more advanced models like GPT-3 and GPT-4.

Finally, Someone Visualized the Transformer!

  • Paper link: https://arxiv.org/pdf/2408.04619
  • GitHub link: http://poloclub.github.io/transformer-explainer/
  • Online experience link for LLM visualization: https://t.co/jyBlJTMa7m
Since it supports custom input, I tried “what a beautiful day,” and the result is shown in Figure 2 below.

Finally, Someone Visualized the Transformer!

The Transformer Explainer has received high praise from many netizens. Some say it is a very cool interactive tool.

Finally, Someone Visualized the Transformer!

Some mentioned that they have been waiting for an intuitive tool to explain self-attention and positional encoding, and that is exactly what the Transformer Explainer does. It will be a game-changing tool.

Finally, Someone Visualized the Transformer!

Others showcased a Chinese project for LLM visualization.

Finally, Someone Visualized the Transformer!

Showcase link: http://llm-viz-cn.iiiai.com/llm
This brings to mind another popular science figure, Karpathy, who has previously written many tutorials on reproducing GPT-2, including “Pure C Language Handcrafted GPT-2, Former OpenAI and Tesla Executive’s New Project is Hot” and “Karpathy’s Latest Four-Hour Video Tutorial: Reproducing GPT-2 from Scratch, Running Overnight to Get It Done.” Now with the visualization tool for the internal principles of the Transformer, it seems that using both together will enhance the learning effect.
System Design and Implementation of Transformer Explainer
The Transformer Explainer visualizes how the GPT-2 model based on the Transformer processes text input and predicts the next token after training. The frontend uses Svelte and D3 for interactive visualization, while the backend utilizes ONNX runtime and HuggingFace’s Transformers library to run the GPT-2 model in the browser.
During the design process of the Transformer Explainer, a major challenge was managing the complexity of the underlying architecture, as showing all details at once could overwhelm users. To address this, the researchers focused on two key design principles.
First, they reduced complexity through multi-level abstraction. They structured the tool to present information at different levels of abstraction. This allows users to start with a high-level overview and gradually delve into the details as needed, avoiding information overload. At the highest level, the tool shows the complete processing flow: from receiving user-provided text as input (Figure 1A), embedding it, processing it through multiple Transformer blocks, and using the processed data to rank the most likely next token predictions.
Intermediate operations, such as the calculation of the attention matrix (Figure 1C), are collapsed by default to visually display the importance of the computation results. Users can choose to expand and view the derivation process through an animated sequence. The researchers adopted a consistent visual language, such as stacking attention heads and collapsing repeated Transformer blocks, to help users identify repetitive patterns in the architecture while maintaining the end-to-end flow of data.
Second, they enhanced understanding and engagement through interactivity. The temperature parameter is crucial in controlling the output probability distribution of the Transformer, affecting the determinism (at low temperatures) or randomness (at high temperatures) of the next token prediction. However, existing educational resources about Transformers often overlook this aspect. Users can now use this new tool to adjust the temperature parameter in real-time (Figure 1B) and visualize its key role in controlling prediction determinism (Figure 2).

Finally, Someone Visualized the Transformer!

Additionally, users can select from provided examples or input their own text (Figure 1A). Supporting custom input text allows users to engage more deeply by analyzing the model’s behavior under different conditions and interactively testing their hypotheses based on different text inputs, enhancing user engagement.
What are some practical application scenarios?
Professor Rousseau is modernizing the curriculum content of her natural language processing course to highlight the latest advancements in generative AI. She noticed that some students view Transformer-based models as an elusive “magic,” while others want to understand how these models work but are unsure where to start.
To address this issue, she guides students to use the Transformer Explainer, which provides an interactive overview of the Transformer (Figure 1), encouraging students to actively experiment and learn. Her class has over 300 students, and the Transformer Explainer can run entirely in students’ browsers without requiring software installation or special hardware, which is a significant advantage, eliminating students’ concerns about managing software or hardware setups.
The tool introduces students to complex mathematical operations, such as attention calculations, through animations and interactive reversible abstractions (Figure 1C). This approach helps students gain both a high-level understanding of the operations and a deeper understanding of the underlying details that produce these results.
Professor Rousseau also realized that the technical capabilities and limitations of the Transformer are sometimes anthropomorphized (for example, treating the temperature parameter as a control for “creativity”). By encouraging students to experiment with the temperature slider (Figure 1B), she shows them how temperature actually modifies the probability distribution of the next token (Figure 2), balancing between deterministic and more creative outputs.
Moreover, when the system visualizes the token processing flow, students can see that there is no so-called “magic”—regardless of the input text (Figure 1A), the model follows a well-defined sequence of operations, sampling one token at a time using the Transformer architecture, and repeating the process.
Future Work
The researchers are enhancing the tool’s interactive explanations to improve the learning experience. They are also working on improving inference speed through WebGPU and reducing the model size using compression techniques. They plan to conduct user research to evaluate the effectiveness and usability of the Transformer Explainer, observing how AI novices, students, educators, and practitioners use the tool, and gathering feedback on additional features they would like to support.
What are you waiting for? Try it out yourself, break the “magic” illusion of the Transformer, and truly understand the principles behind it.
  • Paper link: https://arxiv.org/pdf/2408.04619
  • GitHub link: http://poloclub.github.io/transformer-explainer/
  • Online experience link for LLM visualization: https://t.co/jyBlJTMa7m

– EOF –

Reference link click the bottom left corner to read the original text. For academic sharing only, if there is any infringement, please delete immediately.

Editor / Garvey

Review / Fan Ruiqiang

Verification / Fan Ruiqiang

Click below

Follow us

Leave a Comment