Click on the top “Beginner’s Guide to Computer Vision”, and choose to add a star or “pin”
Essential insights delivered in real time.
In computer vision, CNNs are indispensable.
However, what do convolution, pooling, and Softmax actually look like, and how are they interconnected?
Imagining it from the code can be a bit daunting. Therefore, someone has created a complete 3D visualization using Unity.
It’s not just a framework; the training process is also clearly presented.
For example, the real-time changes in each layer during the training process are shown as the epoch (iteration count) changes.
To better showcase network details, users can freely collapse or expand each layer.
For example, switching between linear and grid layouts for feature maps.
Collapsing the output of feature maps from the convolutional layers.
Applying edge bunding to the fully connected layers, etc.
This visualization can be constructed by loading TensorFlow checkpoints.
It can also be designed in the Unity editor.
Doesn’t it feel a bit like a goose girl?
Recently, this project has gone viral on social media.
Netizens have commented:
“If I could see this process during training, I could endure it for a longer time.”
“Please make it open-source.”
The project’s author is a 3D visual effects artist from Vienna.
According to him, the reason for creating such a CNN visualization tool was that he often found it difficult to understand how convolutional layers connected with each other and with different types of layers when he was first learning about neural networks.
The main features of this tool include visual representations of convolutional, max pooling, and fully connected layers, as well as various mechanisms for clearer visualization.
In summary, it aims to help beginners grasp the key points of CNNs in the most intuitive way.
How to Create a 3D Network with Unity
Before diving into Unity, the author first built a visual 3D network prototype in Houdini software.
This means providing a construction idea for the Unity version of the 3D network in advance, preparing methods for demonstrating convolution calculations, the shapes of feature maps, edge bunding effects, and other issues.
Its node editor looks like this:
Then, you can build the 3D neural network in Unity.
First, you need to preset the “shape” of the neural network.
Since the author had never used Unity before, he first learned about shaders and procedural geometry.
In this process, the author discovered some limitations; he used Shaderlab, the language developed for shader programming in Unity, which does not allow color variations and can only pass predefined variables between vertex, geometry, and pixel shaders.
Moreover, it cannot arbitrarily assign vertex attributes, only predefined properties like position, color, and UV. (This might also be one of the reasons why the 3D network cannot change colors in real-time.)
After researching some concepts related to instancing, the author planned to use geometry shaders to generate the connections of the neural network. The starting and ending points are passed to the vertex shader and forwarded directly to the geometry shader.
These lines can consist of up to 120 vertices, as the scalar floating-point number limit for variables created by geometry shaders in Unity is 1024.
The designed network shape looks something like this:
Next, generate the corresponding 3D neural network image from the model’s TensorFlow code.
Files in Tensorflow-native.ckpt format need to store the data required to reconstruct the model graph, binary weight readings, activation values, and the names of specific layers.
For example, using the Cifar10-greyscale dataset, a checkpoint file needs to be written, and randomly initialized weights need to be set.
After that, these checkpoint files need to be loaded, a TensorFlow session started, and training examples input to query the activation function of each layer.
Then, write a JSON file to store the shape, name, weight, and activation function of each layer for easy reading. Use the weight values to assign color data to the Unity Mesh of each layer.
The final result looks quite good:
The author also recorded a development video, which can be found at the end of the article.
There is a Lot of Related Research
In fact, many scholars have conducted research on neural network visualization.
For example, last May, a Chinese PhD student visualized convolutional neural networks, clearly displaying the changes at each layer; simply clicking on the corresponding neuron reveals its “operation”.
This was a pre-trained 10-layer model loaded using TensorFlow.js, allowing CNN models to run in the browser and interactively display changes in neurons in real-time.
However, this was still a 2D project.
Currently, some have created 3D visualized neural networks, similar to the one above:
This project also utilizes edge bunding, ray tracing, and other technologies, combined with feature extraction, fine-tuning, and normalization to visualize neural networks.
This project aims to estimate the importance of different parts of the neural network using these technologies.
To achieve this, the author represents each part of the neural network with different colors, predicting their interconnections based on the importance of the nodes in the network.
The general processing flow is as follows:
If you are interested in this type of 3D neural network visualization, you can find the corresponding open-source project link at the end of the article.
Author Introduction
Stefan Sietzen, currently residing in Vienna, was previously a freelancer in the field of 3D visual effects.
He is currently a master’s student at Vienna University of Technology, very interested in visual computing; this 3D neural network is one of the projects he worked on during his master’s studies.
Development process:https://vimeo.com/stefsietz
Open-sourced 3D neural network project:https://github.com/julrog/nn_vis
Reference links:https://www.reddit.com/r/MachineLearning/comments/leq2kf/d_convolution_neural_network_visualization_made/https://mp.weixin.qq.com/s/tmx59J75wuRii4RuOT8TTghttps://vimeo.com/stefsietzhttp://portfolio.stefansietzen.at/http://visuality.at/vis2/detail.html
Good news! The Beginner's Guide to Computer Vision knowledge community is now open to the public👇👇👇
Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial
Reply: Extension Module Chinese Tutorial in the background of the “Beginner's Guide to Computer Vision” public account to download the first OpenCV extension module tutorial in Chinese on the internet, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.
Download 2: Python Vision Practical Project 52 Lectures
Reply: Python Vision Practical Project in the background of the “Beginner's Guide to Computer Vision” public account to download 31 practical vision projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, and facial recognition, to help quickly learn computer vision.
Download 3: OpenCV Practical Project 20 Lectures
Reply: OpenCV Practical Project 20 Lectures in the background of the “Beginner's Guide to Computer Vision” public account to download 20 practical projects based on OpenCV to advance OpenCV learning.
Group Discussion
Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat number below to join the group, and note: “nickname + school/company + research direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format, otherwise, it will not be approved. After successful addition, you will be invited to related WeChat groups based on research direction. Please do not send advertisements in the group, otherwise, you will be removed from the group, thank you for your understanding~