Understanding the Nobel Prize (Part 1) | Discussing Machine Learning

Understanding the Nobel Prize (Part 1) | Discussing Machine Learning

01
Two Nobel Prize Winners

Understanding the Nobel Prize (Part 1) | Discussing Machine Learning

John J. Hopfield (left), born in 1933 in Chicago, Illinois, received his PhD from Cornell University in 1958, and is currently a professor at Princeton University.
Geoffrey E. Hinton (right), born in 1947 in London, England, received his PhD from the University of Edinburgh in 1978, and is currently a professor at the University of Toronto, Canada.
02
Three Basic Methods of Machine Learning

Understanding the Nobel Prize (Part 1) | Discussing Machine Learning

Machine learning is an important branch of artificial intelligence and computer science. It is a type of algorithm that builds models based on sample data and uses these models to make predictions or decisions without explicit programming. The basic methods of machine learning are supervised learning, unsupervised learning, and reinforcement learning.
01

Supervised Learning

Supervised learning applies existing knowledge to new data using artificially labeled training samples to predict future events. The linear discriminant analysis proposed by British mathematician Ronald Fisher in 1936 is one of the earliest supervised learning algorithms. In the 1950s, Bayesian classifiers based on Bayesian decision theory began to be used for classification problems. In 1958, American cognitive psychologist Frank Rosenblatt invented the perceptron algorithm, which is considered a precursor to artificial neural networks. In 1967, American information theorist Thomas Cover and computer scientist Peter Hart proposed the K-nearest neighbors algorithm based on the idea of template matching. In the 1980s and 1990s, decision trees and neural network algorithms began to rise. In 1995, two important algorithms—support vector machines and AdaBoost were born. Support vector machines are the main method for handling linear and nonlinear classification problems, while AdaBoost can integrate many other types of algorithms to achieve optimal performance. From 1995 to 1997, German computer scientists Sepp Hochreiter and Juergen Schmidhuber proposed the long short-term memory algorithm, which can partially address the gradient vanishing problem. In 2013, the long short-term memory algorithm successfully combined with deep recurrent neural networks for applications in speech recognition. In 2001, American statistician Leo Breiman proposed the optimized random forest algorithm. Random forests are classifiers that consist of multiple decision trees built in a random manner, offering significant advantages for handling multiple datasets and high-dimensional data.

Common applications of supervised learning include credit scoring, handwriting recognition, speech recognition, information retrieval, financial analysis, and spam detection.

02 Unsupervised Learning

Unsupervised learning is a statistical learning method that discovers hidden features of data through analysis of unknown data. Unsupervised learning includes two main types of algorithms: clustering and dimensionality reduction. In 1963, American Air Force researcher Joe Ward proposed the earliest clustering algorithm—hierarchical clustering based on variance analysis. In 1967, American mathematician James MacQueen proposed the k-means algorithm, which is the most well-known clustering algorithm, leading to numerous improved algorithms and successful applications. In 1977, American statistician Arthur Dempster proposed the maximum likelihood algorithm, which is used for clustering problems and maximum likelihood estimation problems. In 1995, Yizong Cheng, a professor at the University of Cincinnati, proposed the mean shift algorithm for computer vision and image processing. In 2000, American computer scientist Jianbo Shi popularized the spectral clustering algorithm, which can transform clustering problems into optimal cut problems in graphs. The earliest dimensionality reduction algorithm was principal component analysis, proposed by British mathematician and biostatistician Karl Pearson in 1901, more than 40 years before the birth of the first real computer. However, for nearly 100 years thereafter, no significant achievements in dimensionality reduction algorithms appeared in the field of machine learning. After 1998, German computer scientist Bernhard Schölkopf proposed kernel principal component analysis based on kernel methods, enabling nonlinear dimensionality reduction of high-dimensional data. After 2000, manifold learning became a hot topic, mainly focusing on mapping high-dimensional data to low-dimensional data, allowing the low-dimensional data to reflect certain essential structural features of the original high-dimensional data. New algorithms such as locally linear embedding, Laplacian eigenmaps, and locally preserving projections emerged based on manifold learning. The t-distributed stochastic neighbor embedding algorithm, introduced in 2008, is the youngest member among dimensionality reduction algorithms.

Common applications of unsupervised learning include anti-money laundering, customer segmentation, advertisement recommendations, and sales trend forecasting.

03

Reinforcement Learning

Reinforcement learning originates from the behaviorist theory in psychology, emphasizing how agents act to maximize expected benefits under environmental stimuli of rewards or punishments, allowing agents to learn by themselves in the environment. As early as 1954, Minsky proposed the concept and terminology of “reinforcement learning.” In 1965, King-Sun Fu, a professor at Purdue University in the United States, proposed the concept of “intelligent control” while studying cybernetics, defining “trial and error” as the core mechanism of reinforcement learning. In 1957, American applied mathematician Richard Bellman proposed dynamic programming to solve the optimal control problem of Markov decision processes, which used a trial-and-error iterative solving mechanism similar to reinforcement learning. The earliest reinforcement learning algorithm was temporal difference learning proposed by Canadian computer scientist Richard Sutton in 1988, which could directly acquire information from actual experiences without needing to know all the information in the environment and could update decisions in real-time without complete reward feedback. In 1989, Q-learning proposed by British computer scientist Chris Watkins further expanded the applications of reinforcement learning, making it independent of problem models, thus becoming one of the most widely used reinforcement learning methods. However, for nearly 20 years, reinforcement learning developed slowly, overshadowed by the brilliance of supervised learning. After 2010, reinforcement learning combined with neural networks to develop deep reinforcement learning algorithms, marking a significant period of development. In 2013, DeepMind, a subsidiary of Google, published a paper on using reinforcement learning to play Atari games. In 2015, the AlphaGo program developed by DeepMind defeated the second-degree Go player Fan Hui, becoming the first computer Go program to defeat a professional Go player without giving handicaps. In 2016, AlphaGo defeated top professional Go player Lee Sedol 4:1 in a five-game match.

Common applications of reinforcement learning include autonomous driving, machine translation, healthcare, news customization, advertising marketing, and robot control.

03
Development History of Deep Learning
To understand the groundbreaking contributions made by the two award-winning scientists in the development of machine learning, let us review the development history of deep learning and search for the names of Hopfield and Hinton within it.

Deep learning is a branch of machine learning that simulates the brain’s neural network structure to perform representation learning on data. Deep learning originated from the study of the mechanisms of the human brain. American neurophysiologists David Hubel and Torsten Wiesel, who won the Nobel Prize in Physiology or Medicine in 1981, discovered that the information processing of the human visual system is hierarchical, and human perception of high-level features is based on combinations of low-level features. For example, face recognition goes through the process of pixel intake through the pupil (shape judgment) to abstract the concept of a face—recognizing it as a face—where the feature representation becomes increasingly abstract and conceptual from low to high levels. This discovery implies that the brain is a deep architecture, and cognitive processes are also deep, while deep learning precisely forms more abstract high-level features by combining low-level features. The development of deep learning can be divided into three stages: perceptrons, neural networks, and deep learning.

In 1943, American psychologist Warren S. McCulloch and mathematician Walter Pitts proposed the concept of artificial neural networks and constructed a mathematical model of artificial neurons, known as the MCP model, thus pioneering the era of artificial neural network research. In 1949, Canadian psychologist Donald Hebb described the basic principles of synaptic plasticity, explaining the changes that occur in brain neurons during the learning process from a neuroscientific perspective. Hebb’s theory is the biological basis for artificial neural networks. In 1958, Rosenblatt invented the perceptron algorithm at the Cornell Aeronautical Laboratory, which is the world’s first neural network learning algorithm with a complete algorithm description. The perceptron algorithm is a simple configuration of a single-layer neural network that can distinguish basic shapes like triangles. However, limited by computer hardware, the perceptron algorithm could not be widely applied at that time. In 1969, Minsky and Seymour Papert proved that perceptrons could not solve simple linear inseparable problems like XOR, leading perceptron research to decline in the 1970s.

In 1959, Hubel and Wiesel discovered two types of cells in the primary visual cortex of cats while studying their visual nervous system: simple cells and complex cells, where simple cells perceive light information, and complex cells perceive motion information. Inspired by this, in 1980, Japanese computer scientist Kunihiko Fukushima proposed a network model called the “Neocognitron.” This network is divided into multiple layers, each consisting of a type of neuron. Within the network, these two types of neurons alternate, used respectively for extracting and combining pattern information. These two types of neurons later evolved into convolutional layers and pooling layers. However, the neurons in this network were artificially designed and could not adjust based on computational results, so it could only recognize a limited number of simple digits and lacked learning capabilities.

In 1982, American physicist John J. Hopfield proposed the Hopfield neural network model with limited memory capabilities based on statistical physics, pioneeringly demonstrating the stability of neural networks designed according to Hebb’s law. In the same year, Finnish computer scientist Teuvo Kohonen proposed the self-organizing map network, simulating the signal processing mechanisms of brain neurons, used for data analysis and exploration, with its first application in speech analysis. Kohonen’s key invention introduced a system model consisting of a competitive neural network that implements the winner-takes-all function and a subsystem that implements plasticity control. In 1987, American scientists Stephen Grossberg and Gail Carpenter proposed the adaptive resonance theory network, which achieves analogical learning by allowing known and unknown information to resonate. However, these neural networks had limitations in learning efficiency, required constant optimization of design, and had small network memory capacity, which restricted their practical application range.

In 1986, American psychologist David Rumelhart, computer scientist Ronald Williams, and Canadian cognitive psychologist and computer scientist Geoffrey E. Hinton jointly proposed the backpropagation algorithm (BP algorithm). The BP algorithm uses the chain rule of gradients to feedback the differences between output results and true values to the weights of each layer, allowing each layer’s function to be trained like a perceptron. The BP algorithm phase solved the problem of neural network adaptability and autonomous learning. In 1989, Yann LeCun, a French computer scientist at Bell Labs, successfully implemented the practical application of neural networks for the first time. He combined convolutional neural networks with the BP algorithm to propose the LeNet network. In the 1990s, the United States Postal Service used the LeNet network to automatically read postal codes on envelopes. However, neural networks based on the BP algorithm could only solve local optima, and this situation worsened as the number of network layers increased, which limited the development of neural networks.

In 2006, Hinton proposed the deep learning algorithm, which effectively reduced the training difficulty through unsupervised learning and layer-wise pre-training, thus solving the problem that BP neural networks found it difficult to achieve global optimality. In 2012, Hinton’s research team won the championship in the ImageNet image classification competition using deep learning, with an accuracy exceeding the second place by more than 10%, causing a huge sensation in the field of computer vision and triggering a wave of interest in deep learning. In 2013, MIT Technology Review listed deep learning as the top technological breakthrough of the year. Today, deep learning is widely used in search engines, speech recognition, automatic machine translation, natural language processing, autonomous driving, facial recognition, and is one of the hottest research directions in artificial intelligence.

“The work of the award winners has already produced tremendous benefits. In the field of physics, we apply artificial neural networks to a wide range of areas, such as developing new materials with specific properties,” said Ellen Moons, chair of the 2024 Nobel Prize in Physics Committee.

· END ·

The author of this article is Wang Nan, a postdoctoral researcher at the China Association for Science and Technology Innovation Strategy Research Institute, and Wang Guoqiang, a researcher at the same institute.

This article is excerpted from “Algorithm Development in the Intelligent Era,” with modifications made during publication on WeChat public account. All images in the text are from the Royal Swedish Academy of Sciences.

Scientific Illustrated Magazine
Understanding the Nobel Prize (Part 1) | Discussing Machine Learning
Stay Tuned for the 2024 Nobel Prize Series Articles

Leave a Comment