Visualizing Transformers: An Interactive Tool for Understanding

It’s 2024, and there are still people who don’t understand how Transformers work? Come try this interactive tool. In 2017, Google introduced the Transformer in the paper “Attention is All You Need,” marking a major breakthrough in deep learning. The paper has been cited nearly 130,000 times, and all models in the subsequent GPT family are based on the Transformer architecture, highlighting its widespread impact. As a neural network architecture, the Transformer is popular for various tasks ranging from text to vision, especially in the hot field of AI chatbots.

However, for many non-professionals, the internal workings of the Transformer remain opaque, hindering their understanding and engagement. Therefore, it is particularly necessary to unveil the mystery of this architecture. However, many blogs, video tutorials, and 3D visualizations often emphasize the complexity of mathematics and model implementation, which may confuse beginners. Meanwhile, visualizations designed for AI practitioners focus on neuron and layer interpretability, which can be challenging for non-professionals.Thus, researchers from Georgia Tech and IBM Research have developedan open-source interactive visualization tool called “Transformer Explainer,” which helps non-professionals understand the high-level model structure and low-level mathematical operations of Transformers. As shown in Figure 1 below.

The Transformer Explainer explains the internal workings of the Transformer through text generation, using aSankey diagram visualization design, inspired by recent work that views Transformers as dynamic systems, emphasizing how input data flows through model components. The results show that the Sankey diagram effectively illustrates how information is transmitted within the model and demonstrates how input is processed and transformed through the Transformer operations.In terms of content, the Transformer Explainer tightly integrates a model overview summarizing the structure of Transformers and allows users to smoothly transition between multiple levels of abstraction to visualize the interactions between low-level mathematical operations and high-level model structures, helping them comprehensively understand the complex concepts within Transformers.Functionally, the Transformer Explainer not only provides a web-based implementation but also features real-time inference capabilities. Unlike many existing tools that require custom software installation or lack inference functionality, it integrates a real-time GPT-2 model running locally in the browser using modern front-end frameworks. Users can interactively experiment with their input text and observe in real-time how the internal components and parameters of the Transformer work together to predict the next token.Significantly, the Transformer Explainer expands access to modern generative AI technologies without requiring high computational resources, installations, or programming skills. The choice of GPT-2 is due to its high recognition, fast inference speed, and architectural similarities with more advanced models like GPT-3 and GPT-4.

Paper URL: https://arxiv.org/pdf/2408.04619
GitHub URL: http://poloclub.github.io/transformer-explainer/
LLM Visualization Online Experience URL: https://t.co/jyBlJTMa7m

Since it supports custom input, I tried “what a beautiful day,” and the results are shown in Figure 2.

For the Transformer Explainer, many netizens have given high praise. Some say it is a very cool interactive tool.

Some have mentioned that they have been waiting for an intuitive tool to explain self-attention and positional encoding, and that is exactly Transformer Explainer. It is a game-changing tool.

Others have showcased Chinese projects for LLM visualization.

Showcase URL: http://llm-viz-cn.iiiai.com/llmThis brings to mind another big name in the science popularization field, Karpathy, who has previously written many tutorials on reproducing GPT-2, including “Pure C Language Handcrafted GPT-2, Former OpenAI and Tesla Executive’s New Project is Hot” and “Karpathy’s Latest Four-Hour Video Tutorial: Reproducing GPT-2 from Scratch, Running Overnight is Enough.” Now, with the visualization tool for the internal principles of the Transformer, it seems that using both together would enhance the learning effect.System Design and Implementation of Transformer ExplainerThe Transformer Explainer visualizes how the GPT-2 model based on Transformers processes text input and predicts the next token after training. The front end uses Svelte and D3 to achieve interactive visualization, while the back end utilizes ONNX runtime and HuggingFace’s Transformers library to run the GPT-2 model in the browser.During the design process of the Transformer Explainer, a major challenge was managing the complexity of the underlying architecture, as displaying all details at once can overwhelm users. To address this issue, the researchers paid close attention to two key design principles.First, the researchers reduced complexity through multi-level abstraction. They structured the tool to present information at different levels of abstraction. This allows users to start with a high-level overview and gradually delve into details as needed, avoiding information overload. At the highest level, the tool displays the complete processing flow: from receiving user-provided text as input (Figure 1A), embedding it, processing it through multiple Transformer blocks, to ranking predictions of the most likely next token using the processed data.Intermediate operations, such as the calculation of the attention matrix (Figure 1C), are collapsed by default to visually indicate the importance of the calculation results, and users can choose to expand and view the derivation process through an animated sequence. The researchers employed a consistent visual language, such as stacking attention heads and collapsing repeated Transformer blocks, to help users identify recurring patterns in the architecture while maintaining the end-to-end flow of data.Second, the researchers enhanced understanding and engagement through interactivity. The temperature parameter is crucial in controlling the output probability distribution of the Transformer, affecting the determinism (at low temperatures) or randomness (at high temperatures) of the next token predictions. However, existing educational resources about Transformers often overlook this aspect. Users can now use this new tool to adjust the temperature parameter in real-time (Figure 1B) and visualize its critical role in controlling prediction determinism (Figure 2).

Additionally, users can select from provided examples or input their own text (Figure 1A). Supporting custom input text allows users to engage more deeply by analyzing the model’s behavior under different conditions and interactively testing their hypotheses based on different text inputs, enhancing user engagement.So what are the practical application scenarios?Professor Rousseau is modernizing the course content of her natural language processing class to highlight the latest advancements in generative AI. She has noticed that some students view Transformer-based models as elusive “magic,” while others want to understand how these models work but are unsure where to start.To address this issue, she guides her students to use the Transformer Explainer, which provides an interactive overview of Transformers (Figure 1) and encourages students to experiment and learn actively. Her class has over 300 students, and the Transformer Explainer can run entirely in the students’ browsers without the need for software installation or special hardware, which is a significant advantage that alleviates students’ concerns about managing software or hardware setups.The tool introduces students to complex mathematical operations, such as attention calculations, through animations and interactive reversible abstractions (Figure 1C). This approach helps students gain a high-level understanding of the operations while also delving into the underlying details that produce these results.Professor Rousseau also realizes that the technical capabilities and limitations of the Transformer are sometimes anthropomorphized (for example, viewing the temperature parameter as a control for “creativity”). By encouraging students to experiment with the temperature slider (Figure 1B), she demonstrates to them how temperature actually modifies the probability distribution of the next token (Figure 2), thereby controlling the randomness of predictions and achieving a balance between determinism and more creative outputs.Moreover, when the system visualizes the token processing flow, students can see that there is no so-called “magic”—regardless of the input text (Figure 1A), the model follows a clearly defined sequence of operations, using the Transformer architecture to sample one token at a time and then repeat this process.Future WorkThe researchers are enhancing the tool’s interactive explanations to improve the learning experience. They are also working on improving inference speed through WebGPU and reducing the model size using compression techniques. They plan to conduct user studies to evaluate the effectiveness and usability of the Transformer Explainer, observing how AI novices, students, educators, and practitioners use the tool and collecting feedback on additional features they would like to see supported.What are you waiting for? Try it out yourself, break the “magic” illusion of Transformers, and truly understand the principles behind it.

Paper URL: https://arxiv.org/pdf/2408.04619
GitHub URL: http://poloclub.github.io/transformer-explainer/
LLM Visualization Online Experience URL: https://t.co/jyBlJTMa7m

Leave a Comment Cancel reply