The Evolution of Machine Learning: A Grand History

Machine Learning

A grand history of artificial intelligence development

The victory of AlphaGo, the success of autonomous driving, the breakthrough in pattern recognition, and the rapid development of artificial intelligence have repeatedly stirred our nerves. As the core of artificial intelligence, machine learning has also received much attention in the rapid development of AI, shining brightly.

Today, the applications of machine learning have spread across various branches of artificial intelligence, such as expert systems, automated reasoning, natural language understanding, pattern recognition, computer vision, intelligent robotics, and more.

However, perhaps we never thought that the origin of machine learning and even artificial intelligence is an exploration of philosophical issues such as human consciousness, self, and mind. Throughout its development, it has also integrated knowledge from disciplines such as statistics, neuroscience, information theory, control theory, and computational complexity theory.

Overall, the development of machine learning is a significant branch in the history of artificial intelligence development. The stories within are full of twists and turns, surprising and impressive, and deeply moving.

Countless stories of talented individuals are interspersed within; in the introduction below, you will see the appearances of legendary figures as we narrate along the timeline of ML development:

The Evolution of Machine Learning: A Grand History

The Enthusiastic Period of Foundation Laying

From the early 1950s to the mid-1960s

Hebb initiated the first step in machine learning based on neuropsychology in 1949, which later became known as the Hebb Learning Rule. The Hebb Learning Rule is an unsupervised learning rule, where the result of this learning enables the network to extract statistical features from the training set, categorizing input information based on their similarity. This closely aligns with how humans observe and understand the world, which largely involves classifying things based on their statistical features.

The Evolution of Machine Learning: A Grand History

From the formula above, it can be seen that the weight adjustment amount is proportional to the product of input and output, indicating that frequently occurring patterns will significantly impact the weight vector. In this case, the Hebb learning rule needs to preset weight saturation values to prevent unbounded growth of weights when inputs and outputs are consistently positive or negative.

The Hebb learning rule is consistent with the mechanism of “conditioned reflex” and has been confirmed by the theory of neural cells. For example, Pavlov’s conditioned reflex experiment: each time before feeding the dog, a bell is rung, and over time, the dog associates the sound of the bell with food. Later, if the bell rings without food, the dog will still salivate.

In 1950, Alan · Turing created the Turing Test to determine whether a computer is intelligent. The Turing Test states that if a machine can engage in a conversation with humans (via telecommunication) without being distinguishable from a human, then the machine is considered intelligent. This simplification allowed Turing to convincingly demonstrate that “thinking machines” are possible.

The Evolution of Machine Learning: A Grand History

On June 8, 2014, a computer (the computer Eugene Goostman, a chatbot, a computer program) successfully convinced humans that it was a 13-year-old boy, becoming the first computer in history to pass the Turing Test. This is considered a milestone event in the development of artificial intelligence.

In 1952, IBM scientist Arthur · Samuel developed a checkers program. This program could observe the current position and learn an implicit model to provide better guidance for subsequent moves. Samuel discovered that as the program ran longer, it could provide increasingly better subsequent guidance.

Through this program, Samuel refuted the claim that machines could not surpass humans and write code and learn like humans. He created the term “machine learning” and defined it as “the field of study that provides computers with the ability to learn without being explicitly programmed.”

In 1957, Rosen ·blatt proposed a second model based on the background of neural perception science, which is very similar to today’s machine learning models. This was a very exciting discovery at the time, more applicable than Hebb’s ideas. Based on this model, Rosen ·blatt designed the first computer neural network – the perceptron, which simulates the functioning of the human brain.

Three years later, Widrow first used the Delta learning rule for the training steps of the perceptron. This method later became known as the least squares method. The combination of these two created a good linear classifier.

The Evolution of Machine Learning: A Grand History

In 1967, the nearest neighbor algorithm emerged, allowing computers to perform simple pattern recognition. The core idea of the kNN algorithm is that if a sample belongs to the majority category among its k nearest samples in feature space, then it also belongs to that category and has the properties of samples in that category. This method determines the category of a sample based solely on the categories of its nearest one or few samples.

The Evolution of Machine Learning: A Grand History

The advantages of kNN are that it is easy to understand and implement, requires no parameter estimation, does not need training, and is suitable for classifying rare events, especially good for multi-class problems (multi-modal, where objects have multiple class labels), sometimes performing even better than SVM.

In 2002, Han et al. attempted to use a greedy method for document classification to implement a weight-adjustable k-nearest neighbor method WAkNN (weighted adjusted k nearest neighbor) to improve classification results; in 2004, Li et al. proposed that due to the differences in the number of files in different categories, different numbers of nearest neighbors should be selected based on the number of files in each category in the training set to participate in classification.

In 1969, Marvin Minsky pushed the perceptron to its peak. He proposed the famous XOR problem and the situation where perceptron data is linearly inseparable.

Minsky also combined artificial intelligence technology with robotics, developing the world’s earliest robot Robot C, capable of simulating human activities, elevating robotics to a new level. Another major initiative by Minsky was the creation of the famous “Thinking Machines, Inc.,” developing intelligent computers.

After that, research on neural networks went into hibernation until the 1980s. Although the idea of BP neural networks was proposed by Rumelhart in 1970 and called “automatic differentiation back-propagation,” it did not attract sufficient attention.

The Stagnant and Calm Period

From the mid-1960s to the late 1970s

From the mid-60s to the late 70s, the pace of machine learning development was almost stagnant. Although during this period Winston’s structural learning system and Hayes Roth’s logic-based inductive learning system made significant progress, they could only learn single concepts and failed to be applied practically. Additionally, neural network learning machines did not achieve expected results due to theoretical flaws and entered a low tide.

The research goal during this period was to simulate the human concept learning process and use logical structures or graphical structures as internal descriptions for machines. Machines could use symbols to describe concepts (symbolic concept acquisition) and propose various hypotheses about learning concepts.

Indeed, during this period, the entire AI field faced a bottleneck. The limited memory and processing speed of computers at that time were insufficient to solve any practical AI problems. Researchers quickly realized that asking programs to have a child-level understanding of the world was too high a demand: in 1970, no one could create such a vast database, nor did anyone know how a program could learn such rich information.

The Revitalization Period of Renewed Hope

From the late 1970s to the mid-1980s

Beginning in the late 1970s, the focus shifted from learning single concepts to learning multiple concepts, exploring different learning strategies and various learning methods. During this period, machine learning returned to the spotlight after a long period of dormancy.

In 1980, the First International Workshop on Machine Learning was held at Carnegie Mellon University (CMU), marking the global rise of machine learning research. Afterward, machine inductive learning entered practical application.

After some setbacks, the multi-layer perceptron (MLP) was specifically proposed in 1981 by Weibos in the neural network back-propagation (BP) algorithm. Of course, BP remains a key factor in today’s neural network architectures. With these new ideas, research on neural networks accelerated again.

From 1985 to 1986, neural network researchers (Rumelhart, Hinton, Williams-He, and Nelson) successively proposed the concept of combining MLP with BP training.

A very famous ML algorithm was proposed by Quinlan in 1986, known as the decision tree algorithm, more accurately referred to as the ID3 algorithm. This was another spark in mainstream machine learning. Furthermore, unlike the black-box neural network model, the decision tree ID3 algorithm is also used as a software that can find more real-life applications through simple rules and clear references.

The Evolution of Machine Learning: A Grand History

The decision tree for predicting weather conditions for playing tennis

The decision tree is a predictive model that represents a mapping relationship between object attributes and object values. Each node in the tree represents an object, each branching path represents a possible attribute value, and each leaf node corresponds to the value of the object represented by the path taken from the root node to that leaf node. The decision tree has a single output; if multiple outputs are required, separate decision trees can be created to handle different outputs.In data mining,the decision tree is a frequently used technique for analyzing data and can also be used for predictions.

The Maturing Period of Modern Machine Learning

From the early 1990s to the early 21st century

In 1990, Schapire first constructed a polynomial-level algorithm and confirmed the problem, which is the initial Boosting algorithm. A year later, Freund proposed a more efficient Boosting algorithm. However, both algorithms shared a common practical flaw in that they required prior knowledge of the correct lower bound for the weak learning algorithm.

In 1995, Freund and Schapire improved the Boosting algorithm, proposing the AdaBoost (Adaptive Boosting) algorithm, which is nearly as efficient as the Boosting algorithm proposed by Freund in 1991 but does not require any prior knowledge about weak learners, making it easier to apply to practical problems.

The Boosting method is used to improve the accuracy of weak classification algorithms by constructing a series of predictive functions and then combining them in a certain way into one predictive function. It is a framework algorithm that mainly operates on the sample set to obtain a subset of samples, then trains a series of base classifiers on the sample subset using weak classification algorithms.

In the same year, a significant breakthrough in machine learning occurred with the introduction of support vectors (support vector machines, SVM) by Vapnik and Koltchinskii under a wealth of theoretical and empirical conditions. This subsequently divided the machine learning community into neural network and support vector machine communities.

The Evolution of Machine Learning: A Grand History

However, the competition between the two communities was not easy; neural networks lagged behind the SVM kernelized versions until around 2000. Support vector machines achieved good results in tasks that many previous neural network models could not solve. Additionally, support vector machines could utilize all prior knowledge for convex optimization selection, resulting in accurate theoretical and kernel models. Thus, it could significantly advance various disciplines, leading to highly efficient theoretical and practical improvements.

Support vector machines, Boosting, maximum entropy methods (such as logistic regression, LR), etc. The structures of these models can generally be viewed as having one hidden layer (like SVM, Boosting) or no hidden layer (like LR). These models have achieved great success in both theoretical analysis and applications.

Another ensemble decision tree model was proposed by Dr. Breiman in 2001, consisting of a random subset of instances, with each node selected from a series of random subsets. Due to this property, it was named random forest (RF), which has also been theoretically and empirically shown to resist overfitting.

Even the AdaBoost algorithm has shown weaknesses in data overfitting and outlier instances, while random forests are more robust models against these warnings. Random forests have demonstrated success in various tasks, such as competitions like DataCastle and Kaggle.

The Flourishing Development Period

From the early 21st century to the present

The development of machine learning can be divided into two parts: shallow learning and deep learning. Shallow learning originated from the invention of the back-propagation algorithm for artificial neural networks in the 1920s, leading to the prevalence of statistical machine learning algorithms; although at this time, artificial neural network algorithms were also known as multi-layer perceptrons (Multiple layer Perception), due to the difficulty of training multi-layer networks, they typically consisted of shallow models with only one hidden layer.

The leading figure in the field of neural network research, Hinton, proposed the Deep Learning algorithm in 2006, significantly enhancing the capabilities of neural networks and challenging support vector machines. In 2006, the master of machine learning, Hinton, and his student Salakhutdinov published a paper in the prestigious journal Science, initiating a wave of deep learning in both academia and industry.

This paper had two main messages: 1) Many hidden-layer artificial neural networks have excellent feature learning capabilities, and the features learned provide a more essential characterization of the data, facilitating visualization or classification; 2) The difficulty of training deep neural networks can be effectively overcome through “layer-wise pre-training,” which is achieved through unsupervised learning in this paper.

Hinton’s student Yann LeCun’s LeNets deep learning network can be widely applied in ATMs and banks worldwide. At the same time, Yann LeCun and Andrew Ng believe that convolutional neural networks allow artificial neural networks to train quickly because they occupy very little memory, eliminating the need to store filters separately at every position in an image, making them very suitable for building scalable deep networks, thus being ideal for recognition models.

In 2015, to commemorate the 60th anniversary of the concept of artificial intelligence, LeCun, Bengio, and Hinton released a joint review of deep learning.

Deep learning enables computational models with multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have brought significant improvements in many areas, including state-of-the-art speech recognition, visual object recognition, object detection, and many other fields, such as drug discovery and genomics. Deep learning can discover complex structures in big data. It utilizes the BP algorithm to complete this discovery process, guiding machines on how to adjust internal parameters based on errors obtained from the previous layer, which can be used to compute representations. Deep convolutional networks have made breakthroughs in processing images, videos, speech, and audio, while recurrent networks have shown remarkable performance in handling sequential data, such as text and speech.

The most popular methods in the current statistical learning field mainly include deep learning and SVM (support vector machine), which are representative methods of statistical learning. It can be considered that both neural networks and support vector machines originated from the perceptron.

Neural networks and support vector machines have always been in a “competitive” relationship. SVM utilizes the expansion theorem of kernel functions, requiring no explicit expression of non-linear mapping; since it establishes linear learning machines in high-dimensional feature spaces, it not only does not significantly increase computational complexity compared to linear models but also avoids the “curse of dimensionality” to some extent. In contrast, earlier neural network algorithms were prone to overfitting, requiring numerous empirical parameters to be set, and the training speed was relatively slow, often not outperforming other methods when the number of layers was low (less than or equal to 3).

Neural network models seem capable of achieving more challenging tasks, such as object recognition, speech recognition, and natural language processing. However, it should be noted that this does not imply the end of other machine learning methods. Despite the rapid growth of successful cases for deep learning, the training costs for these models are quite high, and adjusting external parameters can be cumbersome. At the same time, the simplicity of SVM keeps it as the most widely used machine learning method.

Artificial intelligence and machine learning is a young discipline born in the mid-20th century, which has had a significant impact on human production and lifestyle, also sparking intense philosophical debates. However, overall, the development of machine learning is not much different from the development of other general matters, and it can also be viewed through the lens of philosophical development.

The development of machine learning has not been smooth sailing; it has also gone through a spiral rise process, coexisting achievements and setbacks. It is the achievements of numerous researchers that have led to today’s unprecedented prosperity in artificial intelligence, marking a process of quantitative change leading to qualitative change, resulting from both internal and external factors.

Looking back, we are all likely to be awed by this grand history.

Source: DataCastle

Recent Popular Articles Top 10

↓ Click the title to view ↓

1. 125 Scientific Frontier Issues Published by Science

2. The 44 Images That Are Most Addictive!

3. Where Are All the Alien Civilizations? The Fermi Paradox Tells You the Answer!

4. The 10 Most Beautiful Formulas in the World

5. How Far Can the “Sky Eye” FAST See? | This is a Mind-Bending Topic

6. Why Is Mathematics Adorable?

7. What Is the Principle of Friction?

8. Review of Future Super Materials

9. A Debate on the Essence of Physics

10. Finally Here! Special Topic on Special Relativity

Click on the “Top 10” in the public account menu to view past monthly popular articles Top 10

The Evolution of Machine Learning: A Grand History

Leave a Comment Cancel reply