Click the "Xiaobai Learns Vision" above, select "Star" or "Pin"
Heavy content delivered to you first
Jishi Guide
Illustrated overview of the entire neural network architecture, and tools and techniques for understanding specific modules.
Baseline Model
AlexNet is a groundbreaking architecture that made Convolutional Neural Networks (CNNs) the main machine learning algorithm for handling large image classification tasks. The paper introducing AlexNet presents a great diagram, but it seems to be missing something…

It is easy to see that the upper half of this diagram has been inadvertently cropped, and this diagram will run through all subsequent slides, references, etc. In my opinion, this indicates that visualization is not valued in deep learning research (though there are exceptions, such as the online journal Distill).
Some might argue that developing new algorithms and tuning parameters is the true science/engineering, while visual presentation belongs to the realm of art and is of no value. I completely disagree with this viewpoint!
Of course, for a computer running the program, lack of indentation or ambiguous variable names may not matter much. But it is different for humans. Academic papers are not a means of discovery, but a means of communication.
Take another complex theory—quantum field theory, for example. If you want to present the annihilation process of electron-positron pairs, you need to create a muon-antimuon pair; the diagram below is a Feynman diagram (first-order term):

Isn’t that cute? But this diagram has no artistic quality. It is merely a graphical representation of the scattering amplitude, where each line represents a propagator, and each vertex represents an interaction between points. This diagram can be directly converted into the following:
I tend to prefer “making things simpler,” just like how I handle complex tensor operations in JavaScript, and visualizing the results in advance is pretty cool. In both quantum mechanics and deep learning, we can perform a lot of linear algebra operations using tensor structures. In fact, some have even implemented Einstein summation convention using Pytorch.
Explaining Layers of Neural Networks
Before understanding the network architecture, let’s first focus on the basic building blocks of the network—layers. For example, the Long Short-Term Memory (LSTM) unit can be described as follows:
Of course, if you are familiar with matrix multiplication, you can easily solve these equations. But solving these equations is one thing; understanding them is another. I could solve the formulas of LSTM the first time I saw them, but I didn’t know what they meant.
What I mean by “understanding” is not a mental enlightenment, but rather establishing a mental model that we can use (for explanation, simplification, modification, and predicting what-if scenarios, etc.). Generally speaking, diagrams are clearer than verbal explanations:

“Understanding LSTM Networks” is a good article about LSTMs, explaining the principles of LSTM step by step. This article gave me a flash of insight, transforming a seemingly random set of multiplications into a reasonable way to write (and read) data.
The diagram below is a clearer LSTM diagram:

I believe that a good diagram is worth a thousand formulas.
This almost applies to any module. We can visualize concepts, such as dropout:

Diagrams can be used to explain composite modules composed of smaller modules (e.g., several convolutions behind). Take a look at this Inception module diagram:

Each visualization image is different—not only in style but also in the emphasis and abstraction of the content. So what are the important aspects? Number of layers, connections between layers, size of convolution kernels, or activation functions? It all depends. Abstraction means “the process of independently thinking about the relationships and properties of things.” The challenge lies in determining what to emphasize and what can be briefly summarized.
For example, in the diagram of Batch Normalization, the focus is on the reverse process:

Data Visualization and Data Art
You might think I am trying to make deep learning articles look more attractive. But making charts look better isn’t a bad thing. When I explore data, I usually use good color schemes to provide readers with a better reading experience. My main point is to transform visual images into a more efficient means of communication.
So, does looking better mean better? Not necessarily. Lisa Charlotte Rost’s article “The Line between Data Vis and Data Art” insightfully explains the difference between the two.

Take the following diagram as an example:

Isn’t it beautiful? To me, it looks alive—like a cell with organelles. But what can we infer from it? Can you guess that it is actually AlexNet?
Here’s another example, a more aesthetic illustration of a multi-layer perceptron:

To be clear: as long as we don’t confuse artistic value with educational value, data art has its own value. If you like my viewpoint, then I encourage you to visualize real convolutional networks with 3D animations like Spark or colorful brains.
Sometimes the trade-off is not so clear. Like the diagram below, does it represent data visualization or data art?

I guess you would say, “This is obviously data visualization.” In this case, our opinions differ. Although this diagram has a nice color scheme and the repetition of similar structures looks pleasant, it is difficult to implement this network based on this diagram (at least without a magnifying glass). Of course, you can get the key points of this network architecture—the number of layers and the structure of modules—but these alone are not enough to reproduce the network (at least without a magnifying glass).
To make the image clear, publications usually leave some space for data art. For example, in a network used to detect skin conditions, we can see the diagram of the Inception v3 feature extraction layer. It is clear that the author simply used the model and represented it with a diagram without explaining its internal workings:

How would you classify the two images below to study the visual patterns of the activated selected channels?

I would consider the diagram below a great example of data visualization. Psychedelic images do not necessarily mean data art. The focus of this example is on the abstraction of network architecture and the presentation of related data (input images that activate given channels).
Explanatory Architecture Diagrams
We’ve looked at some examples of layer diagrams and data art related to neural network architecture.
Next, let’s understand the (data) visualization of neural network architectures. The diagram below is the architecture of VGG16, a standard network for image classification.

We can see the tensor size and operations at each step (marked by color). It is not abstract—the size of the boxes is related to the tensor shape. However, the thickness and the number of channels are not proportional.
Another similar way is to show the values of each channel, such as in the working example of DeepFace:

Such diagrams are not limited to computer vision. Below is an example of converting text into color:

If the goal is to present the network architecture while explaining its internal workings, such diagrams are very useful. They seem especially useful in tutorials, such as http://karpathy.github.io/2015/05/21/rnn-effectiveness/.
Abstract Architecture Diagrams
However, for large models, explanatory diagrams may be too complex or too specific to present all possible layers in one diagram. Therefore, abstract diagrams are needed. Generally, nodes represent operations, and arrows represent tensor flow. Comparing VGG-19 and ResNet-34:

We can see that the above diagram has some redundancy due to some reused units. Since the image may be long, it is better to find its patterns and merge them. Such a hierarchy makes it easier to understand concepts and present them visually (unless we only want to create data art diagrams for GoogLeNet).
For example, let’s take a look at the diagram of Inception-ResNet-v1:

I like the composition of this diagram—we can see what is happening and which modules are reused.
Another diagram that clarifies concepts for me is the U-Net diagram used for image segmentation:

Note that here the nodes represent tensors, and the arrows represent operations. I find this diagram very clear—we can see the tensor shapes, convolution operations, and pooling operations. Since the original U-Net architecture is not very complex, we can ignore its hierarchical structure.
When we want to create clear diagrams with more complex building blocks, it gets a bit complicated. To reproduce the network, we need to understand the details of the network:
-
Number of channels;
-
Convolutions in each max pooling;
-
Number of max poolings;
-
Batch normalization or dropout;
-
Activation functions (is it using ReLU? Before or after BN?)
Below is a great example of an abstract diagram:

This diagram could be better in terms of color, but I like its simplicity. It also clearly illustrates the number of channels, breaking down each complex layer into its building blocks while retaining all details (note the three-level hierarchy).
There is also an interesting way to represent the hierarchy of neural network modules:

Automated Tools for Visualizing Neural Network Architectures
You can manually draw networks. Use Inkscape like Chris Olah, or if you like LaTeX, you can use TikZ, or other tools. You can also automatically generate images.
I hope you realize that you are already using visual representations—code (text is a form of visual representation!)—to interact with computers. For some projects, code is sufficient, especially if you are using concise frameworks (like Keras or PyTorch). For more complex architectures, diagrams add some explanatory value.
TensorBoard: Graphs
TensorBoard is arguably the most commonly used network visualization tool. The diagram below shows a TensorFlow network graph:

Does this diagram provide a readable overview of the neural network?
I think not.
Although this diagram presents the computational structure, it is still a bit verbose (for example, adding biases as separate operations). Furthermore, it obscures the most important parts of the network: core parameters in operations (like the size of convolution kernels) and the size of tensors. Despite all these shortcomings, I still recommend reading the full paper: Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow (http://idl.cs.washington.edu/files/2018-TensorFlowGraph-VAST.pdf)
This paper provides insights into the challenges encountered when creating network graphs from the ground up. While we can use all TensorFlow operations, including auxiliary operations (like initialization tools and logging tools), creating a universal, readable graph remains a challenge. If we overlook what the reader values, we cannot create a general tool to transform TensorFlow computation graphs into useful (like publication-ready) graphs.
Keras
Keras is a high-level deep learning framework, so it has great potential for generating beautiful visualization graphs. (Note: If you want to use interactive training graphs for Jupyter Notebook, I wrote one: livelossplot (https://github.com/stared/livelossplot).) However, in my opinion, Keras’s default visualization options (using GraphViz) are not top-notch:

I think it not only hides important details but also provides redundant data (repeated tensor sizes). Aesthetically, I don’t like it either.
I tried writing another one (pip install keras_sequential_ascii) for training:

This structure is suitable for small sequential network architectures. I find it useful for training and courses like “Starting deep learning hands-on: image classification on CIFAR-10.” But it is of no use for more advanced projects (some suggested I use the branch visualization tool in this git log (https://stackoverflow.com/questions/1057564/pretty-git-branch-graphs)). It’s clear that I am not the only one trying to beautify neural network visualization graphs with ASCII:

I think the most aesthetically pleasing graph I found is in Keras.js:

This project is not actively developed, but it supports TensorFlow.js. Since it is open-source and modular (using the Vue.js framework), it can serve as a starting point for creating standalone visualization projects. Ideally, it would work in Jupyter Notebook or a separate browser window, just like using displaCy to parse sentences.
Conclusion
We have seen many examples of neural network visualizations, all of which made trade-offs in the following aspects:
-
Data visualization vs Data art (utility vs aesthetics)
-
Clarity vs Ambiguity
-
Shallow vs Hierarchical
-
Static vs Interactive (providing more information)
-
Specific vs General (is it applicable to a wide range of neural network families?)
-
Direction of data flow (top-down, bottom-up, or left-right?)
Each of these topics could be the subject of a master’s thesis, and merging them all could result in a doctoral thesis (especially regarding how people visualize and what content should be abstracted in detail).
Original link:https://medium.com/inbrowserai/simple-diagrams-of-convoluted-neural-networks-39c097d2925b
Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial
Reply "Extension Module Chinese Tutorial" in the background of the "Xiaobai Learns Vision" public account to download the first OpenCV extension module tutorial in Chinese on the internet, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.
Download 2: Python Vision Practical Project 52 Lectures
Reply "Python Vision Practical Project" in the background of the "Xiaobai Learns Vision" public account to download 31 vision practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, facial recognition, etc., to help quickly learn computer vision.
Download 3: OpenCV Practical Project 20 Lectures
Reply "OpenCV Practical Project 20 Lectures" in the background of the "Xiaobai Learns Vision" public account to download 20 practical projects based on OpenCV, achieving advanced learning of OpenCV.
Group chat
Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat number below to join the group, with remarks: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Visual SLAM". Please follow the format, otherwise, you will not be approved. After adding successfully, you will be invited into the relevant WeChat group based on research direction. Please do not send advertisements in the group, otherwise you will be removed from the group. Thank you for your understanding~