What Are Artificial Neural Networks?

*This article is from the 22nd issue of “Banyue Tan” in 2024

The 2024 Nobel Prize in Physics has unexpectedly honored the achievement of “fundamental discoveries and inventions to promote the use of artificial neural networks for machine learning.” What exactly are artificial neural networks? Can their potential really be compared to fundamental physical sciences?

Let us start from decades ago…

Starting from Mathematics

In 1943, neurologist Warren McCulloch and mathematician Warren Pitts proposed a system that simulates the information processing of brain neurons through mathematical modeling, called the “multilayer perceptron.”

Wang Yuguang, an associate professor at the Institute of Natural Sciences and the School of Mathematical Sciences at Shanghai Jiao Tong University, introduced that the “multilayer perceptron” can be viewed as a simplified version of artificial neural networks. It can have many layers, each containing numerous neurons, with each neuron acting as an information processor. The principle of this system is quite similar to composite functions in mathematics, where different layers can be seen as different functions.

Wang Yuguang showcases the artificial intelligence model developed by his team, photo by Xu Dongyuan

At that time, multilayer perceptrons could already learn relatively common mappings. For example, when a person sees a picture, they can identify its category, and multilayer perceptrons could generally establish this mapping from images to categories, provided that the number of layers and neurons was sufficiently large and that there was enough data for effective training.

However, training early artificial neural networks was quite challenging due to the significant increase in system parameters, leading to difficulties. It wasn’t until Geoffrey Hinton designed the backpropagation algorithm, inspired by the chain rule of calculus, that artificial neural networks could automatically adjust the weights of massive input data, allowing scientists to see the dawn of significant upgrades.

But will the road ahead always be smooth?

Towards Deep Learning

At the end of the last century, due to the scarcity of computing power and other practical factors, artificial neural networks fell into a period of silence. It wasn’t until the early 21st century that this field began to flourish again. NVIDIA’s development of GPUs significantly enhanced the parallel computing capabilities of computers, and the proliferation of the internet further accelerated the training of artificial neural networks.

A landmark turning point occurred in 2007 when Stanford University professor Fei-Fei Li led a team to organize and construct a large-scale image dataset called ImageNet, containing about 10 million images across 1000 different categories. This foundational work established a standard for verifying the effectiveness of algorithms in subsequent image recognition research. To encourage more people to participate in verification, the ImageNet Challenge was born.

This challenge became a catalyst for the revolution in artificial neural networks. Multilayer perceptrons performed poorly in the competition and gradually faded from the historical stage, passing the baton to AlexNet in 2012. This was the moment when the term “deep learning” that we are all familiar with today shone brightly.

The most important innovation of AlexNet was the introduction of convolutional layers, which constructed convolutional neural networks that effectively alleviated the vanishing gradient problem. The issue of error gradients diminishing or even disappearing layer by layer, making it difficult to obtain sufficient gradient information for learning, was a significant challenge for the previous generation of artificial neural networks. The success of convolutional neural networks made more efficient deep learning possible.

Diverse Models Flourish

In the past decade, neural network models have flourished. The most well-known may be AlphaGo, which defeated a Go master, and AlphaFold, which predicts protein structures, both of which are developed by DeepMind. DeepMind is headquartered in London, UK, where graduates from Cambridge and Oxford universities collaborate across disciplines, sparking many innovative ideas. Currently, scientists at Cambridge University are attempting to develop diffusion models based on graph neural networks for protein sequence design, with participation from Chinese scientists as well.

It is worth mentioning that AlphaFold won the 2024 Nobel Prize in Chemistry, and DeepMind announced the open-source release of AlphaFold3 on November 11, allowing scientists to download the software code for non-commercial use.

Another noteworthy route is the large language models that have garnered much attention. Natural language processing, which evolved from computational linguistics, has made rapid progress after merging with artificial neural networks, especially with the introduction of the Transformer architecture, laying the groundwork for a series of new model explorations. One of the most familiar examples is OpenAI’s ChatGPT, which can be considered a milestone in artificial intelligence applications.

Author: Dong Xue, Xu Dongyuan

Original title: “Are Artificial Neural Networks Really ‘Divine’?”

Leave a Comment Cancel reply