Convolutional Neural Networks: Understanding the Digit Zero

Convolutional Neural Networks: Understanding the Digit Zero

Cover Image: Airbnb Headquarters, Illustrated in March 2020

Recently, while exploring artificial intelligence, I felt that among the materials available, there is a lot of information that can yield results through programming steps, but many people regard this process as a black box. It is often said that we do not know why this process works; all we know is that the computer has intelligence.

I am certainly not satisfied with this answer, so after watching numerous videos on Convolutional Neural Networks on YouTube, I still felt a lack of clear materials explaining how artificial intelligence really works.

Books that discuss the impact of artificial intelligence on humanity without addressing the code at all, only at a very high level, are indeed of limited help to someone like me who wants to build something hands-on.

This is why I want to write some articles to combine my thoughts on the cultural aspects of artificial intelligence with code practice.

The Simplest Question

To answer the question of how artificial intelligence recognizes things, we can start by attempting to answer it through the Hello World example in the AI field (Note 1). This example is the MNIST dataset, which trains to recognize handwritten digits from 0 to 9.

MNIST Dataset

A significant difference between the field of artificial intelligence and traditional programming is the proportion of data to code. Imagine that in traditional programming, about 90% of the program content is code, while the various constants in the code may account for less than 10% from a storage perspective. In the field of artificial intelligence, the code is relatively small, with the majority being data. For example, Glove has only 100K of code, but the training dataset consists of 6 billion words from the internet, resulting in 1G of training data. The ratio of code to data is reversed: 99.999% is data, and less than 0.001% is code.

Many datasets in the field of artificial intelligence are named after the institutions that provided the data. MNIST (Modified National Institute of Standards and Technology) is such a foundational dataset. It contains images of 60,000 handwritten digits for training, along with the corresponding numbers (which are considered the correct answers), and also includes 10,000 pairs for testing accuracy, structured in the same way.

The process of obtaining these images is quite clever; they were sampled from machine-readable cards collected by the U.S. Census Bureau. These are the cards used during college entrance exams, where students handwrite their exam numbers and shade the corresponding black ovals beside the numbers with a 2B pencil. This way, we have a natural correspondence between handwritten images and correct answers. Someone packaged 60,000 of these handwritten digits and their correct answers for use by machine learning researchers, challenging them to achieve higher recognition rates.

Let’s Code and See the Data Prepared for Us

The following four lines of Python code can download this 11.49M dataset from the internet to your local machine using TensorFlow.

# Import the famous TensorFlow. Install it via pip3 install tensorflow
import tensorflow as tf
# Download the MNIST dataset from the internet to local ~/.keras/datasets/mnist.npz
# x_train, y_train are the 60,000 training images and answers
# x_test, y_test are the 10,000 testing images and answers
# The training set is like routine exercises, while the test set is the actual exam questions.
# To prevent cheating, the computer cannot see the exam papers during training.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Print the image of the 1001st digit
print(x_train[1000])
# Print the "correct answer". The result should be 0
print(y_train[1000])

If we are curious about what the data looks like inside, we can print it directly and get the following (you can scroll left and right to see the full picture).

[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0  36 146 254 255 251  95   6   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0   0   0   3  97 234 254 254 232 254 254  35   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0   0  89 140 254 254 174  67  33 200 254 190   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0 108 253 254 235  51   1   0   0  12 254 253  56   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0  12 216 254 244  55   0   0   0   0   6 213 254  57   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0  25 254 254 132   0   0   0   0   0   0 168 254  57   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0  45 254 243  34   0   0   0   0   0   0 168 254  57   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0 128 254 157   0   0   0   0   0   0   0 168 254  57   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0  19 228 254 105   0   0   0   0   0   0   7 228 254  57   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0  58 254 254  87   0   0   0   0   0   0  10 254 246  47   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0  58 254 254   9   0   0   0   0   0   0  10 254 210   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0  58 254 254   9   0   0   0   0   0   0 105 254  91   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   5 219 254   9   0   0   0   0   0  24 230 254  24   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0 216 254   9   0   0   0   0   0  84 254 251  23   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0 216 254  36   0   0   0   0  22 208 251  94   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0 129 254 120   0   0   0   3 140 254 229   0   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0  83 254 222  17   0   0  91 254 236  53   0   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0  18 235 254 134  21 119 237 254 124   0   0   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0  53 249 254 234 252 254 172   3   0   0   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0   0 116 237 254 254 133  20   0   0   0   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0] 
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]]

It turns out that the data is an array of 60,000 elements, each element being a 28-row data matrix with 28 numbers per row, where each number is an integer between 0 and 255 representing grayscale. 0 is pure black, 255 is pure white, and the numbers in between represent 256 levels of gray from 0 to 255.

Can you guess which digit the above 28 x 28 = 784 numbers represent?

Using our well-trained human neural networks, we can judge that the arrangement of these 784 numbers in 28 rows and 28 columns should represent the digit 0. Checking the correct answer provided in the training set confirms that it is also 0.

Next, we will display this image using three lines of Python statements:

# Import the plotting tool matplotlib. Install it via pip3 install matplotlib
import matplotlib.pyplot as plt
# Directly give this 28x28 matrix to Matplotlib to plot the grayscale image
plt.imshow(x_train[1000], cmap="gray")
# Display it
plt.show()

If the pure numbers above resemble the scene seen through the eyes of Tank from The Matrix, the image below restores it to a world visible to the human eye. This seems clearer, it is indeed 0, no doubt about it.

Convolutional Neural Networks: Understanding the Digit Zero

In the MNIST dataset, there are 5,923 different people writing various forms of 0. Later, we will train the computer to recognize this digit with just a few lines of code, achieving an accuracy rate of up to 99%.

Our journey starts from this 0. I hope to answer the question, “How does the computer know this is a 0?”

Reflection Question

I hope you, like me, gaze affectionately at this little circle and soul-search to ask yourself: How do I know this is a 0?

Carefully explore every corner of your neural network, questioning your brain about how it knows in less than 0.01 seconds that this is a 0 and not a 1 or a 4. Is it because I see a circle? But how do I know this is a circle? What exactly happens in my brain before I recognize it as a circle?

Note 1:

The Hello World in C language is printf(“Hello World”);

The Hello World in Hadoop is Word Count;

And the Hello World of artificial intelligence must be the MNIST digit recognition.

Leave a Comment