Visualizing CNNs: A Comprehensive 3D Representation

Click on the topBeginner’s Guide to Computer Vision”, and choose to add a star or “pin

Essential insights delivered in real time.

In computer vision, CNNs are indispensable.

However, what do convolution, pooling, and Softmax actually look like, and how are they interconnected?

Imagining it from the code can be a bit daunting. Therefore, someone has created a complete 3D visualization using Unity.

Visualizing CNNs: A Comprehensive 3D Representation

It’s not just a framework; the training process is also clearly presented.

For example, the real-time changes in each layer during the training process are shown as the epoch (iteration count) changes.

Visualizing CNNs: A Comprehensive 3D Representation

Visualizing CNNs: A Comprehensive 3D Representation

To better showcase network details, users can freely collapse or expand each layer.

For example, switching between linear and grid layouts for feature maps.

Visualizing CNNs: A Comprehensive 3D Representation

Collapsing the output of feature maps from the convolutional layers.

Visualizing CNNs: A Comprehensive 3D Representation

Applying edge bunding to the fully connected layers, etc.

Visualizing CNNs: A Comprehensive 3D Representation

This visualization can be constructed by loading TensorFlow checkpoints.

Visualizing CNNs: A Comprehensive 3D Representation

It can also be designed in the Unity editor.

Visualizing CNNs: A Comprehensive 3D Representation

Doesn’t it feel a bit like a goose girl?

Recently, this project has gone viral on social media.

Visualizing CNNs: A Comprehensive 3D Representation

Netizens have commented:

“If I could see this process during training, I could endure it for a longer time.”

“Please make it open-source.”

Visualizing CNNs: A Comprehensive 3D Representation

The project’s author is a 3D visual effects artist from Vienna.

According to him, the reason for creating such a CNN visualization tool was that he often found it difficult to understand how convolutional layers connected with each other and with different types of layers when he was first learning about neural networks.

The main features of this tool include visual representations of convolutional, max pooling, and fully connected layers, as well as various mechanisms for clearer visualization.

In summary, it aims to help beginners grasp the key points of CNNs in the most intuitive way.

How to Create a 3D Network with Unity

Before diving into Unity, the author first built a visual 3D network prototype in Houdini software.

Visualizing CNNs: A Comprehensive 3D Representation

This means providing a construction idea for the Unity version of the 3D network in advance, preparing methods for demonstrating convolution calculations, the shapes of feature maps, edge bunding effects, and other issues.

Its node editor looks like this:

Visualizing CNNs: A Comprehensive 3D Representation

Then, you can build the 3D neural network in Unity.

First, you need to preset the “shape” of the neural network.

Since the author had never used Unity before, he first learned about shaders and procedural geometry.

In this process, the author discovered some limitations; he used Shaderlab, the language developed for shader programming in Unity, which does not allow color variations and can only pass predefined variables between vertex, geometry, and pixel shaders.

Moreover, it cannot arbitrarily assign vertex attributes, only predefined properties like position, color, and UV. (This might also be one of the reasons why the 3D network cannot change colors in real-time.)

Visualizing CNNs: A Comprehensive 3D Representation

After researching some concepts related to instancing, the author planned to use geometry shaders to generate the connections of the neural network. The starting and ending points are passed to the vertex shader and forwarded directly to the geometry shader.

These lines can consist of up to 120 vertices, as the scalar floating-point number limit for variables created by geometry shaders in Unity is 1024.

The designed network shape looks something like this:

Visualizing CNNs: A Comprehensive 3D Representation

Next, generate the corresponding 3D neural network image from the model’s TensorFlow code.

Files in Tensorflow-native.ckpt format need to store the data required to reconstruct the model graph, binary weight readings, activation values, and the names of specific layers.

For example, using the Cifar10-greyscale dataset, a checkpoint file needs to be written, and randomly initialized weights need to be set.

Visualizing CNNs: A Comprehensive 3D Representation

After that, these checkpoint files need to be loaded, a TensorFlow session started, and training examples input to query the activation function of each layer.

Then, write a JSON file to store the shape, name, weight, and activation function of each layer for easy reading. Use the weight values to assign color data to the Unity Mesh of each layer.

Visualizing CNNs: A Comprehensive 3D Representation

The final result looks quite good:

Visualizing CNNs: A Comprehensive 3D Representation

The author also recorded a development video, which can be found at the end of the article.

There is a Lot of Related Research

In fact, many scholars have conducted research on neural network visualization.

For example, last May, a Chinese PhD student visualized convolutional neural networks, clearly displaying the changes at each layer; simply clicking on the corresponding neuron reveals its “operation”.

Visualizing CNNs: A Comprehensive 3D Representation

This was a pre-trained 10-layer model loaded using TensorFlow.js, allowing CNN models to run in the browser and interactively display changes in neurons in real-time.

However, this was still a 2D project.

Currently, some have created 3D visualized neural networks, similar to the one above:

Visualizing CNNs: A Comprehensive 3D Representation

This project also utilizes edge bunding, ray tracing, and other technologies, combined with feature extraction, fine-tuning, and normalization to visualize neural networks.

This project aims to estimate the importance of different parts of the neural network using these technologies.

To achieve this, the author represents each part of the neural network with different colors, predicting their interconnections based on the importance of the nodes in the network.

Visualizing CNNs: A Comprehensive 3D Representation

The general processing flow is as follows:

Visualizing CNNs: A Comprehensive 3D Representation

If you are interested in this type of 3D neural network visualization, you can find the corresponding open-source project link at the end of the article.

Author Introduction

Visualizing CNNs: A Comprehensive 3D Representation

Stefan Sietzen, currently residing in Vienna, was previously a freelancer in the field of 3D visual effects.

He is currently a master’s student at Vienna University of Technology, very interested in visual computing; this 3D neural network is one of the projects he worked on during his master’s studies.

Development process:https://vimeo.com/stefsietz

Open-sourced 3D neural network project:https://github.com/julrog/nn_vis

Reference links:https://www.reddit.com/r/MachineLearning/comments/leq2kf/d_convolution_neural_network_visualization_made/https://mp.weixin.qq.com/s/tmx59J75wuRii4RuOT8TTghttps://vimeo.com/stefsietzhttp://portfolio.stefansietzen.at/http://visuality.at/vis2/detail.html

Good news! The Beginner's Guide to Computer Vision knowledge community is now open to the public👇👇👇






Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial
Reply: Extension Module Chinese Tutorial in the background of the “Beginner's Guide to Computer Vision” public account to download the first OpenCV extension module tutorial in Chinese on the internet, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.

Download 2: Python Vision Practical Project 52 Lectures
Reply: Python Vision Practical Project in the background of the “Beginner's Guide to Computer Vision” public account to download 31 practical vision projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, and facial recognition, to help quickly learn computer vision.

Download 3: OpenCV Practical Project 20 Lectures
Reply: OpenCV Practical Project 20 Lectures in the background of the “Beginner's Guide to Computer Vision” public account to download 20 practical projects based on OpenCV to advance OpenCV learning.

Group Discussion

Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat number below to join the group, and note: “nickname + school/company + research direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format, otherwise, it will not be approved. After successful addition, you will be invited to related WeChat groups based on research direction. Please do not send advertisements in the group, otherwise, you will be removed from the group, thank you for your understanding~






Leave a Comment