The hottest technology right now is definitely artificial intelligence.
The underlying model of artificial intelligence is the “neural network”. Many complex applications (such as pattern recognition and automatic control) and advanced models (like deep learning) are based on it. To learn artificial intelligence, one must start with it.
So what is a neural network? There seems to be a lack of simple explanations online.
A couple of days ago, I read Michael Nielsen’s open-source textbook “Neural Networks and Deep Learning” and unexpectedly found that the explanations in it are very easy to understand. Below, I will introduce what a neural network is based on this book.
1. Perceptron
Historically, scientists have always hoped to simulate the human brain and create machines that can think. Why can humans think? Scientists found that the reason lies in the neural network of the human body.
-
External stimuli are transformed into electrical signals through nerve endings and transmitted to nerve cells (also called neurons).
-
Countless neurons form the nerve center.
-
The nerve center integrates various signals and makes judgments.
-
The human body reacts to external stimuli based on the instructions from the nerve center.
Since the basis of thinking is neurons, if we can create “artificial neurons”, we can form an artificial neural network to simulate thinking. In the 1960s, the earliest model of “artificial neurons” was proposed, called the “perceptron”, which is still in use today.
The circle in the above image represents a perceptron. It accepts multiple inputs (x1, x2, x3…), producing one output (output), similar to how nerve endings perceive various changes in the external environment and ultimately produce electrical signals.
To simplify the model, we assume that each input has only two possibilities: 1 or 0. If all inputs are 1, it indicates that all conditions are met, and the output is 1; if all inputs are 0, it indicates that no conditions are met, and the output is 0.
2. Example of Perceptron
Let’s look at an example. The city is holding an annual game and animation exhibition, and Xiao Ming is unsure whether to visit on the weekend.
He decides to consider three factors.
-
Weather: Is it sunny on the weekend?
-
Companionship: Can he find someone to go with?
-
Price: Is the ticket affordable?
This constitutes a perceptron. The three factors above are the external inputs, and the final decision is the output of the perceptron. If all three factors are Yes (represented by 1), the output is 1 (go visit); if all are No (represented by 0), the output is 0 (do not go visit).
3. Weights and Thresholds
At this point, you might ask: If some factors are true while others are not, what will the output be? For example, if the weather is good, the ticket is cheap, but Xiao Ming cannot find a companion, should he still go visit?
In reality, various factors rarely hold equal importance: some are decisive, while others are secondary. Therefore, we can assign weights (weight) to these factors to represent their different importance.
-
Weather: weight of 8
-
Companionship: weight of 4
-
Price: weight of 4
The weights above indicate that weather is a decisive factor while companionship and price are secondary factors.
If all three factors are 1, the sum multiplied by the weights will be 8 + 4 + 4 = 16. If the weather and price factors are 1, and the companionship factor is 0, the sum becomes 8 + 0 + 4 = 12.
At this point, a threshold (threshold) also needs to be specified. If the sum exceeds the threshold, the perceptron outputs 1; otherwise, it outputs 0. Assuming the threshold is 8, then 12 > 8, and Xiao Ming decides to go visit. The height of the threshold indicates the strength of the desire; a lower threshold indicates a stronger desire to go, while a higher threshold indicates less desire to go.
The decision-making process above can be mathematically expressed as follows.
In the formula above, x represents various external factors, and w represents the corresponding weights.
4. Decision Model
A single perceptron forms a simple decision model and can be used. In the real world, actual decision models are much more complex, consisting of multiple perceptrons in a multi-layer network.
In the image above, the bottom layer perceptrons receive external inputs, make judgments, and then send signals as inputs to the upper layer perceptrons until the final result is obtained. (Note: The output of the perceptron still has only one, but can be sent to multiple targets.)
In this image, the signals are all unidirectional, meaning the output of the lower layer perceptrons is always the input of the upper layer perceptrons. In reality, recursive transmission can occur, where A sends to B, B sends to C, and C sends back to A, which is called a “recurrent neural network”. This article does not cover that.
5. Vectorization
To facilitate subsequent discussions, we need to perform some mathematical processing on the model above.
-
External factors x1, x2, x3 are written as a vector <x1, x2, x3>, abbreviated as x.
-
Weights w1, w2, w3 are also written as a vector (w1, w2, w3), abbreviated as w.
-
Define the operation w⋅x = ∑ wx, which is the dot product of w and x, equal to the sum of the products of factors and weights.
-
Define b as the negative threshold b = -threshold.
The perceptron model becomes as follows.
6. The Operation Process of Neural Networks
Building a neural network requires meeting three conditions.
-
Inputs and outputs
-
Weights (w) and thresholds (b)
-
Structure of multi-layer perceptrons
In other words, it is necessary to draw the image that appeared above in advance.
Among them, the most difficult part is determining the weights (w) and thresholds (b). So far, these two values have been subjectively assigned, but in reality, it is very difficult to estimate their values, and a method is needed to find the answers.
This method is trial and error. Keeping all other parameters constant, a slight change in w (or b), denoted as Δw (or Δb), is observed for output changes. This process is repeated until the corresponding set of w and b that yields the most accurate output is obtained. This process is called model training.
Thus, the operation process of the neural network is as follows.
-
Determine inputs and outputs
-
Find one or more algorithms to derive outputs from inputs
-
Find a dataset of known answers to train the model and estimate w and b
-
Once new data is produced, input it into the model to get results while correcting w and b
As can be seen, the entire process requires massive calculations. Therefore, neural networks have only recently become practically valuable, and general CPUs are insufficient; specialized GPUs designed for machine learning are needed for computation.
7. Example of Neural Networks
Let’s explain neural networks through the example of automatic license plate recognition.
Automatic license plate recognition means that cameras on highways capture photos of license plates, and computers recognize the numbers in the photos.
In this example, the license plate photo is the input, and the license plate number is the output, with the clarity of the photo representing the weight (w). Then, find one or more image comparison algorithms to serve as the perceptron. The algorithm yields a result as a probability, such as a 75% probability that it is the number 1. A threshold (b) (such as 85% confidence) needs to be set, below which the result is invalid.
A set of already recognized license plate photos serves as the training dataset. By continuously adjusting various parameters, the combination that yields the highest accuracy is found. In the future, when new photos are obtained, results can be directly provided.
8. Continuity of Outputs
The model above has a problem: according to the assumption, the output has only two results: 0 and 1. However, the model requires small changes in w or b to induce changes in output. If it only outputs 0 and 1, it is too insensitive, which cannot ensure the correctness of training. Therefore, the “output” must be transformed into a continuous function.
This requires a bit of simple mathematical modification.
First, let the calculation result of the perceptron wx + b be denoted as z.
z = wx + b
Then, calculate the following expression and denote the result as σ(z).
σ(z) = 1 / (1 + e^(-z))
This is because if z approaches positive infinity z → +∞ (indicating a strong match of the perceptron), then σ(z) → 1; if z approaches negative infinity z → -∞ (indicating a strong mismatch of the perceptron), then σ(z) → 0. In other words, as long as σ(z) is used as the output result, the output will become a continuous function.
The original output curve looks like this.
Now it looks like this.
In fact, it can also be proven that Δσ satisfies the following formula.
That is, Δσ and Δw and Δb are linearly related, and the rate of change is the partial derivative. This is beneficial for accurately calculating the values of w and b.
This article is reprinted from Ruan Yifeng’s blog.
– End –
What topics would you like to read about, friends?
Feel free to leave comments in the comment section.