The Rise and Fall of Neural Networks in AI

5.4

The Intellectual

The Intellectual

Image Source: Freepik

● 　● 　●

Written by｜Zhang Tianrong

As physicist and Manhattan Project leader Oppenheimer said, “We are not only scientists, we are also human beings.” Where there are people, there is a world of conflicts, and the scientific community is no exception. People often say, “Science knows no borders, but scientists have their own countries.” Even without discussing these political entanglements, scientists belong to different academic circles, each holding their own academic views and claims, leading to ongoing academic debates. Generally speaking, free debate is beneficial for academic progress, but it can also lead to misunderstandings and biases that harm certain scientists and affect the normal development of science. Today, I will narrate a story from the world of AI history…

1. Rosenblatt’s Perceptron

In July 1958, the U.S. Navy Research Office announced an extraordinary invention, claiming to showcase the “first machine capable of human thought,” as shown in Figure 1.

The demonstrator fed a series of punched cards into an electronic device, inputting them into a 5-ton computer the size of a room (IBM704). After 50 trials, the computer learned to distinguish cards marked on the left from those marked on the right. In other words, this machine could learn to “classify,” just as a child learns to distinguish between cats and dogs under parental guidance. Classification is an important function in AI research.

Figure 1: Rosenblatt and the Perceptron Mark-1

The U.S. Navy demonstrated the “Perceptron” (also known as the Perceptron). According to its creator, Dr. Frank Rosenblatt, this is an electronic device built on the principles of “neural networks” in biology, with learning capabilities. Rosenblatt detailed and expanded this method in his 1962 publication, “Principles of Neurodynamics: Perceptrons and the Mechanisms of the Brain” [1]. That year, Rosenblatt gained international recognition for the Perceptron. The New York Times called it a revolution, with the headline “New Navy Device Learns Through Practice,” and The New Yorker also praised this technological advancement.

At that time, Rosenblatt was a research psychologist and project engineer at the Cornell Aeronautical Laboratory in Buffalo, New York. After demonstrating the Perceptron, he became an Associate Professor of Neurobiology and Behavior in the Department of Biological Sciences at Cornell University the following year.

Rosenblatt’s Perceptron project was inspired by the formal neural networks of McCulloch and Pitts and was designed as a simple single-layer neural network specifically for image recognition, with additional machine learning mechanisms. Figure 2 shows the logical diagram of the Perceptron. The Perceptron received substantial funding from the U.S. Navy, initially based on two years of software research. Subsequently, Rosenblatt built and demonstrated the only hardware version of the Perceptron: Mark-1, which consisted of 400 photoelectric cells connected to neurons (a 20×20 matrix of light-sensitive units) that could convert incoming optical signals (such as English characters) into electrical signals, which were then connected via physical cables to a layer of neurons that classified the letters. The weight function of the Mark-1 synapses was encoded by potentiometers, with electric motors implementing changes in weights during the learning process.

It is understandable that under the technological conditions of that time, the specific implementation of this machine was quite challenging, which is why it caused a sensation and widespread attention.

The Rise and Fall of Neural Networks in AI

Figure 2: Conceptual Diagram of the Perceptron Design (1958)

Rosenblatt had high hopes for his Perceptron and held a very optimistic attitude towards the research of neural networks in artificial intelligence, believing that breakthroughs were imminent. The once low-profile scientist suddenly became famous, attending various speeches and parties, which certainly attracted the attention of the AI giants of that year.

Rosenblatt’s work caught the attention of MIT’s Marvin Minsky. Minsky, two years before the Perceptron’s debut, in 1956, along with McCarthy and others, initiated the Dartmouth Conference, which established the name “artificial intelligence” and discussed development directions and other issues. Based on his research on neural networks, he became skeptical of Rosenblatt’s claims. Doubt is a normal phenomenon in science, and thus they frequently debated the feasibility of the Perceptron at academic conferences. In one meeting, the two had a heated argument, and their conflict became public. It is said that those debates were very intense, as recalled by their colleagues and students later, with phrases like “stunned watching from the sidelines” and “shocked by their argument.” Minsky directly attacked the value and prospects of the Perceptron, pointing out that its practical value was very limited and that it could never serve as a primary research method for solving problems in artificial intelligence.

“Rosenblatt believed he could make computers read and understand language, while Marvin Minsky pointed out that this was impossible because the Perceptron’s functions were too simple,” recalled a research student from that time.

Later, in 1969, Minsky and another mathematics professor at MIT, Seymour Papert, published a book titled “Perceptrons” [2], which theoretically proved the Perceptron’s flaws and included personal attacks on Rosenblatt: “Most of the content of Rosenblatt’s papers… has no scientific value.”

The strong criticism of Rosenblatt’s work in the book “Perceptrons” essentially ended the fate of the Perceptron. The following year, Minsky received the Turing Award, the highest honor in the field of computer science.

Minsky was an authoritative figure in the industry, and such direct negative evaluations of the Perceptron were fatal to the inherently proud Rosenblatt. More than a year later, Rosenblatt drowned while sailing alone on his 43rd birthday, leaving his name, his Perceptron, and his regrets and dreams forever in the history of AI science.

The book “Perceptrons” not only dealt a blow to Rosenblatt, causing a temporary failure of the Perceptron, but also nearly stifled research in neural networks at that time, leading to the first downturn in artificial intelligence for nearly a decade.

2. Symbolism and Connectionism

In fact, there are many similarities in the backgrounds and experiences of Rosenblatt and Minsky; they were of similar ages, both born into Jewish families in New York, and they even attended the same high school simultaneously, being alumni of the Bronx High School of Science. Indeed! That school produced 8 Nobel Prize winners and one Nobel laureate in economics, along with countless other notable figures. Such a prestigious school produced brothers who became rivals in the academic sea! One can’t help but recall the famous saying: “Why the rush to kill each other?”

However, after Rosenblatt’s death, Minsky deleted the personal attack sentences from the reprint of the book “Perceptrons” and handwritten the phrase “In memory of Frank Rosenblatt,” which somewhat expressed his mourning for the early deceased colleague and fellow alumnus.

Moreover, their debate also represented the academic dispute between symbolism and connectionism in artificial intelligence at the time [4].

Marvin Minsky (1927–2016), born in New York City, was a pioneer of deep learning. While an undergraduate at Harvard University, he developed early electronic learning networks. While studying at Princeton University for his graduate degree, he built the first neural network learning machine, SNARC. His doctoral thesis was titled “The Theory of Neural-Simulating Reinforcement Systems and Its Application to the Problem of Brain Models,” which was essentially a paper on neural networks. Therefore, Minsky’s work during his graduate studies laid the foundation for research on artificial neural networks, which should belong to the category of connectionism.

In 1956, he, along with John McCarthy, Claude Shannon, and others, initiated the Dartmouth College Conference, which created the term “artificial intelligence” and is considered one of the founding figures of AI. The Dartmouth Conference also represented the victory of symbolism, with Minsky and McCarthy being regarded as typical representatives of AI symbolism. Their intention at that time was to oppose the connectionism of early cybernetics, believing that the goal of artificial intelligence was to implement rules in computers through programs and to use logical reasoning to counter connectionism in AI. From the mid-1960s to the early 1990s, the symbolic approach prevailed.

It is evident that Minsky later shifted towards symbolism and tried to downplay his relationship with connectionism, which may have been one of the reasons for his strong criticism of the Perceptron.

Since 1958, Minsky taught at MIT, serving as a professor of electrical engineering and computer science until his death.

At MIT, he co-founded the Artificial Intelligence Laboratory (the predecessor of the MIT Computer Science and Artificial Intelligence Laboratory). He had several inventions, such as the confocal microscope published in 1957 and the head-mounted display published in 1963.

On January 24, 2016, Minsky died from a brain hemorrhage at the age of 88.

His rival, Frank Rosenblatt (1928–1971), was one year younger and was a psychologist.

Rosenblatt was born into a Jewish family on Long Island, New York. After graduating from Bronx Science High School in 1946, he entered Cornell University, where he earned his bachelor’s degree in 1950 and his doctoral degree in the same year. Subsequently, he went to the Cornell Aeronautical Laboratory in Buffalo, New York, where he served successively as a research psychologist, senior psychologist, and head of the cognitive systems department. This was also where he conducted early work on the Perceptron.

In 1966, Rosenblatt joined the newly established Department of Biological Sciences at Cornell University as an associate professor of neurobiology and behavior. He developed a strong interest in transferring learned behaviors from trained mice to small mice through brain extract injections, publishing numerous articles on this topic in the following years.

Rosenblatt was also interested in astronomy, spending $3,000 to buy a telescope, but it was too large to fit anywhere. Therefore, he bought a large house near Brooklyn and invited several of his graduate students to live there. During the day, the team worked at Tobermory, and at night, they engaged in construction work in Rosenblatt’s yard to build an observatory.

Rosenblatt was multi-talented and had diverse interests, dissecting bats in the lab during the day to study the learning mechanisms of animal brains and gazing at the sky at his makeshift observatory in his backyard at night, trying to explore the mysteries of extraterrestrials. In terms of personality, Rosenblatt was shy and introverted, not flamboyant.

The Perceptron was always Rosenblatt’s passion. Unfortunately, he did not survive the winter of artificial intelligence, drowning on his 43rd birthday while sailing. In 2004, the IEEE Computational Intelligence Society established the Rosenblatt Award to honor those who have made outstanding contributions in biologically and speech-inspired computing, commemorating this outstanding scientist.

Figure 3: Articles and Books Related to the Perceptron at the Time

The Dartmouth Conference in 1956 initiated the first wave of artificial intelligence, which lasted until the early 1970s, with the modeling reasoning method representing symbolism as its core feature. The mainstream research in this area was dominated by Minsky from MIT, Simon and Newell from Carnegie Mellon University, and McCarthy from Stanford University. At that time, the experts in the symbolic circle had basically established a monopoly on AI issues, gaining access to most of the funding and large computer systems.

The main characteristic of symbolists is that they do not place much emphasis on the connection between machine intelligence and the world, only creating an independent reasoning space within the computer, viewing artificial intelligence as the science of machine thinking, with the goal of endowing machines with logical and abstract capabilities.

In contrast, Rosenblatt, as a psychologist, was more interested in human physiology and psychological behavior, thus leaning towards connectionism. Naturally, he was keen to simulate the neural transmission mechanisms of the human brain using the concept of neural networks, leading to the research and invention of the Perceptron.

The success of the Perceptron in the media also ignited enthusiasm among connectionism researchers. However, Minsky and Papert’s declaration in their 1969 book that they had proven the ineffectiveness of neural networks poured cold water on these scientists, sharply reducing the heat of connectionism. Although the impact of this book may have exceeded Minsky’s intentions, the consequences were certain: neural networks were abandoned, and funding was severely cut. In fact, not only did connectionism decline, but criticism of symbolism also increased, leading to a freeze on both symbolism and connectionism projects, and federal funding for AI research dried up. Artificial intelligence was viewed merely as artificial games, entering the first winter of its developmental journey.

3. Perceptrons and Neural Networks

Returning to Rosenblatt’s Perceptron [5], it is essentially the prototype of modern neural networks. Its scientific value is evidenced by the rapid development of AI today. Of course, as a first-generation artificial intelligence machine, the Perceptron inevitably had various flaws, and at that time, Rosenblatt had not yet had the opportunity to extend the learning algorithms of the Perceptron to multi-layer neural networks. Neural networks vary from simple to complex, as shown in Figure 4. The Perceptron is merely the simplest neural network with a single layer (left in Figure 4), while modern neural networks can have millions of (hidden) layers (right in Figure 4).

Figure 4: The Perceptron and Complex Neural Networks

However, Minsky believed that the Perceptron’s flaws were fatal because it could not simulate “non-linearly separable” functions. He cited the example of a logic gate: the XOR gate, which the Perceptron could not distinguish. Below is a brief introduction to this.

The simple model of the Perceptron neuron is shown in the left diagram of Figure 4: multiple inputs and one output. The output function is obtained by taking the inner product of the input vector and the weight vector, followed by an activation function to yield a scalar result.

Why can neural networks classify? One reason is due to the contribution of the activation function. For example, the simplest activation function is a step function, outputting either 0 or 1, meaning that this function achieves classification: dividing the results into two categories.

As for when to output 0 and when to output 1? This must be determined based on the input values. For example, one could ask three questions to determine whether it is a cat or a dog: Are the ears up or down? Is the mouth protruding? Are the whiskers long or short? The simplest decision-making method is: if all three questions input “yes,” the output is a cat; otherwise, it is a dog. However, the activation function can be changed from a step function to a smooth function, as shown by the red line in the lower right corner of the left diagram in Figure 4. This kind of function facilitates differential calculation during optimization, and the output can be understood as the probability of deciding whether it is a cat or a dog.

Why can neural networks possess learning capabilities? This is because each input has a weight value, and these parameters are the core of neural networks. During the training process, the network adjusts these weights to minimize errors on specific tasks. This weight updating process is what is referred to as “machine learning.” Minimization can be achieved using various optimization algorithms; for instance, the “gradient descent method” is used in the Perceptron.

As shown in the formula above the left diagram in Figure 4, the summation function calculated for the output is a hyperplane in n-dimensional space. Therefore, the essence of the Perceptron’s neural network “classification” is that this hyperplane divides the space into two parts. For a neural network with two input terminals, it means dividing the plane into two parts with a straight line, as shown in the linear separable case in Figure 5b.

Figure 5: Perceptron Classification, Linearly Separable and Non-Separable

However, if the input samples are linearly non-separable (right side of Figure 5b), the Perceptron cannot simulate this situation. This is the flaw Minsky pointed out regarding the Perceptron.

Figure 6 shows several basic logic gates; a single-layer Perceptron can be used to distinguish three of them: AND, NAND, and OR, but it cannot simulate the XOR function because it is linearly non-separable.

Figure 6: Logic Gates; the first three are linearly separable, while XOR is non-linearly separable.

To solve the non-linearly separable problem, one can consider using multi-layer functional neural networks. The layer of neurons between the output layer and the input layer is known as the hidden layer, and both the hidden layer and output layer neurons are functional neurons with activation functions. In the left diagram of Figure 7, the Perceptron’s neurons do not have a hidden layer, and the decision calculation generates only one straight line, unable to distinguish the XOR problem. However, adding a hidden layer with a non-linear activation function can solve this. The non-linearity of the activation function output from the hidden layer also helps resolve the non-linearly separable problem. Adding one hidden layer is equivalent to increasing a spatial dimension, as shown in the right diagram of Figure 6, forming a single hidden-layer neural network that can generate two straight lines for decision calculations, thus distinguishing the XOR problem.

Figure 7: Adding a Hidden Layer to Solve the Perceptron’s XOR Problem

For multi-hidden layer neural networks, there is also a “universal approximation theorem,” which means that multi-layer neural networks using S-shaped functions as activation functions can approximate any complex function and achieve arbitrary approximation accuracy.

In summary, starting from the 1980s and 1990s, connectionism re-emerged, and research on neural networks returned to the mainstream. Many believe that Rosenblatt’s theories have been proven correct. The naive Perceptron has its flaws, but its basic principles sparked the modern AI revolution. Deep learning and neural networks are changing our society, and understanding the Perceptron and the rise and fall of neural networks helps us better recognize AI and the future of AI development.

References:（Scroll down to browse）

[1] Rosenblatt, Frank (1962). “A Description of the Tobermory Perceptron.” Cognitive Research Program. Report No. 4. Collected Technical Papers, Vol. 2. Edited by Frank Rosenblatt. Ithaca, NY: Cornell University.

[2] Minsky, M. L. and Papert, S. A. 1969. Perceptrons. Cambridge, MA: MIT Press.

[3] https://en.wikipedia.org/wiki/Frank_Rosenblatt

[4] Popular Science China: The Three Major Schools of Artificial Intelligence

https://www.kepuchina.cn/zt/salon/tsrgzn/201901/t20190123_924578.shtml

[5] Wikipedia: Perceptron

https://zh.wikipedia.org/wiki/%E6%84%9F%E7%9F%A5%E5%99%A8

FollowThe Intellectual Video Channel

Get more interesting and informative popular science content

Leave a Comment Cancel reply