The Rise and Fall of Neural Networks in AI

5.4

The Intellectual

The Intellectual

Image Source: Freepik

● 　● 　●

Written by｜Zhang Tianrong

As physicist and Manhattan Project leader Oppenheimer said, “We are not just scientists; we are also human.” Where there are humans, there is a community, and the scientific world is no exception. People often say that “science knows no borders, but scientists have homelands.” Even without discussing these political entanglements, there are still different academic circles among scientists, each holding their own academic views and claims, leading to ongoing academic debates. Generally speaking, free debate is beneficial for academic progress, but it can also lead to misunderstandings and biases, which can harm certain scientists and affect the normal development of science. Today, I will tell a story from the history of AI…

1. Rosenblatt’s Perceptron

In July 1958, the U.S. Navy Research Office announced an extraordinary invention, claiming to showcase “the first machine capable of human thought,” see Figure 1.

The demonstrator fed a series of punched cards into an electronic device connected to a computer weighing 5 tons and the size of a room (IBM704). After 50 trials, the computer learned to distinguish cards marked on the left from those marked on the right. In other words, this machine could learn to “classify,” just as a child learns to distinguish between cats and dogs under parental guidance. Classification is an important function in artificial intelligence research.

Figure 1: Rosenblatt and the Perceptron Mark-1

The U.S. Navy demonstrated the “Perceptron” (or Perceptron). According to its creator, Dr. Frank Rosenblatt, this is an electronic device constructed based on the principles of “neural networks” in biology, with learning capabilities. Rosenblatt detailed and expanded this method in his 1962 publication, “Principles of Neurodynamics: Perceptrons and the Mechanisms of the Brain”[1]. At that time, Rosenblatt gained international recognition for the Perceptron. The New York Times called it a revolution, with the headline “New Navy Device Learns by Doing,” and The New Yorker also praised this technological advancement.

At that time, Rosenblatt was a research psychologist and project engineer at the Cornell Aeronautical Laboratory in Buffalo, New York. A year after demonstrating the Perceptron, he became an associate professor of neurobiology and behavior in the Department of Biological Sciences at Cornell University.

Rosenblatt’s Perceptron project was inspired by the formal neural networks of McCulloch and Pitts, designed as a simple single-layer neural network specifically for image recognition, with additional machine learning mechanisms added. Figure 2 is a logical schematic of the Perceptron. The Perceptron received substantial funding from the U.S. Navy, initially based on two years of software research. Subsequently, Rosenblatt built and demonstrated the only hardware version of the Perceptron: Mark-1, which consisted of 400 photoelectric cells (a 20×20 matrix of light-sensitive units) connected to neurons, capable of converting input optical signals (such as English characters) into electrical signals, which were then connected to a layer of neurons for letter classification via physical cables. The weight function of the Mark-1 synapses was encoded using potentiometers, with motors implementing changes in weights during the learning process.

It is conceivable that, given the technological conditions at the time, the specific implementation of this machine was quite difficult, which is why it generated a sensation and widespread attention.

The Rise and Fall of Neural Networks in AI

Figure 2: Conceptual Design of the Perceptron (1958)

Rosenblatt had high hopes for his Perceptron and was very optimistic about the research into artificial intelligence neural networks, believing that breakthroughs were imminent. The once low-profile scientist suddenly became famous, attending various lectures and parties, which naturally attracted the attention of the AI giants of that time.

Rosenblatt’s work caught the attention of MIT’s Professor Marvin Minsky. Two years before the Perceptron was introduced, in 1956, Minsky, along with McCarthy and others, initiated the Dartmouth Conference, which established the name “artificial intelligence” and discussed its development direction and other issues. Based on his research on neural networks, he had serious doubts about Rosenblatt’s claims. Skepticism in science is normal, and thus they often publicly debated the feasibility of the Perceptron at academic conferences. At one conference, they had a heated argument, and the conflict became fully public. It is said that those debates were very intense, as recalled by their colleagues and students later, with remarks like “stunned by the spectacle” and “shocked by their argument.” Minsky launched a direct attack on the value and future of the Perceptron, pointing out that its practical value was very limited and it could never serve as a primary research method for solving artificial intelligence problems.

“Rosenblatt believed he could make computers read and understand language, while Marvin Minsky pointed out that this was impossible because the Perceptron’s functions were too simple,” recalled a graduate student from that time.

Later, in 1969, Minsky and another MIT mathematics professor, Seymour Papert, published a book titled “Perceptrons”[2], which theoretically proved the defects of the Perceptron, including personal attacks on Rosenblatt: “Most of the content of the papers written by Rosenblatt… has no scientific value.”

The strong criticism of Rosenblatt’s work in “Perceptrons” essentially ended the fate of the Perceptron. The following year, Minsky was awarded the Turing Award, receiving the highest honor in the field of computer science.

Minsky was an authoritative figure in the industry at the time, and such a blunt negative evaluation of the Perceptron was fatal to the inherently proud Rosenblatt. More than a year later, Rosenblatt drowned while sailing alone on his 43rd birthday, leaving his name, his Perceptron, his regrets, and dreams forever in the history of artificial intelligence.

The book “Perceptrons” not only struck a blow to Rosenblatt, causing a temporary failure of the Perceptron, but also nearly stifled research in neural networks at that time, leading to the first downturn in artificial intelligence for a decade.

2. Symbolism and Connectionism

In fact, Rosenblatt[3] and Minsky had many similarities in their backgrounds and experiences; they were of similar ages and both born into Jewish families in New York. They even attended the same high school, being alumni of the Bronx High School of Science, which produced 8 Nobel laureates and a Nobel Prize winner in Economics, along with countless other notable figures. Such a prestigious school’s brothers turned rivals in the academic sea! One cannot help but recall the famous saying: “Why the rush to harm each other?”

However, after Rosenblatt’s death, Minsky deleted the personal attacks on Rosenblatt from the reprint of “Perceptrons” and handwritten the phrase “In memory of Frank Rosenblatt,” expressing some degree of mourning for this early deceased fellow alumnus.

Additionally, the debate between the two also represented the academic conflict between symbolism and connectionism in artificial intelligence at that time[4].

Marvin Minsky (1927—2016) was born in New York City and was a pioneer of deep learning. While studying at Harvard University, he developed early electronic learning networks. During his graduate studies at Princeton University, he built the first neural network learning machine, SNARC. His doctoral thesis was titled “Theory of Neural-Simulated Reinforcement Systems and its Applications to the Brain Model Problem,” which was essentially a paper on neural networks. Thus, Minsky’s work during his graduate studies laid the foundation for research in artificial neural networks and should be classified as connectionism.

In 1956, he, along with John McCarthy, Claude Shannon, and others, initiated the Dartmouth College Conference, which coined the term “artificial intelligence” and was one of the founding fathers of AI. The Dartmouth Conference was also a victory for symbolism, with Minsky and McCarthy being regarded as typical representatives of AI symbolism. Their intention at that time was to oppose the connectionism of early cybernetics, believing that the goal of artificial intelligence was to implement rules in computers through programming, using logical reasoning to counter connectionism in AI. From the mid-1960s to the early 1990s, the symbolic approach prevailed.

It is evident that Minsky later turned to symbolism and also tried to downplay his relationship with connectionism, perhaps one reason for his strong criticism of the Perceptron.

Since 1958, Minsky taught at MIT, serving as a professor of electrical engineering and computer science until his death. At MIT, he co-founded the Artificial Intelligence Laboratory (the predecessor of the MIT Computer Science and Artificial Intelligence Laboratory). He had several inventions, such as the confocal microscope published in 1957 and the head-mounted display published in 1963.

On January 24, 2016, Minsky passed away due to a brain hemorrhage at the age of 88.

Minsky’s rival, Frank Rosenblatt (1928－1971), was one year younger and a psychologist.

Rosenblatt was born into a Jewish family on Long Island, New York. After graduating from Bronx Science in 1946, he entered Cornell University, where he earned his bachelor’s degree in 1950 and his doctorate in 1950. He then went to Cornell Aeronautical Laboratory in Buffalo, New York, where he served as a research psychologist, senior psychologist, and head of the cognitive systems department. This was also where he conducted early work on the Perceptron.

In 1966, Rosenblatt joined the newly established Department of Biological Sciences at Cornell University as an associate professor of neurobiology and behavior. He developed a keen interest in transferring learned behaviors from trained mice to small mice by injecting brain extracts, publishing numerous articles on this subject in the following years.

Rosenblatt was also interested in astronomy; he spent $3,000 on a telescope, but it was too large to store. Therefore, he bought a large house near Brooklyn Dale in New York and invited several of his graduate students to live there. During the day, the team worked at Tobermory, and at night, they did civil engineering work in Rosenblatt’s yard to build an observatory.

Rosenblatt was a versatile person with diverse interests, dissecting bats in the lab during the day to study the learning mechanisms of animal brains, and gazing at the sky at his makeshift observatory in his backyard at night, attempting to explore the mysteries of extraterrestrial life. In terms of personality, Rosenblatt was shy and introverted, not flamboyant.

The Perceptron was always Rosenblatt’s passion. He ultimately did not survive the winter of artificial intelligence, drowning on his 43rd birthday while sailing alone. In 2004, the IEEE Computational Intelligence Society established the Rosenblatt Award (IEEE Frank Rosenblatt Award) to honor individuals who have made outstanding contributions in the field of biologically and speech-inspired computing, in memory of this outstanding scientist.

Figure 3: Articles and Books Related to the Perceptron at the Time

The Dartmouth Conference in 1956 initiated the first wave of artificial intelligence, which lasted into the early 1970s, characterized by the modeling and reasoning methods representative of symbolism as its core feature. The mainstream of this research was composed of Minsky from MIT, Simon and Newell from Carnegie Mellon University, and McCarthy from Stanford University. At that time, the experts in the symbolic circle had essentially established a monopoly on the issues of artificial intelligence and gained access to most funding and large computer systems.

The main characteristic of the symbolists was their lack of emphasis on the relationship between machine intelligence and the world, only opening up an independent reasoning space within the computer, viewing artificial intelligence as the science of machine thinking, with the goal of endowing machines with logical and abstract capabilities.

In contrast, Rosenblatt, as a psychologist, was more interested in human physiology and psychological behavior, thus leaning towards connectionism. Naturally, he was keen on simulating the neural transmission mechanisms of the human brain using the concept of neural networks, which led to the research and invention of the Perceptron.

The success of the Perceptron in the media also sparked enthusiasm among connectionist researchers. However, Minsky and Papert’s claim in their 1969 book that they had proven the inefficacy of neural networks poured cold water on these scientists, significantly reducing the enthusiasm for connectionism. Although the impact of this book may have exceeded Minsky and others’ intentions, the consequences were clear: neural networks were abandoned, and funding was drastically cut. In fact, not only did connectionism decline, but criticism of symbolism also increased, leading to a freeze on both symbolic and connectionist projects, and federal funding for artificial intelligence research dried up. Artificial intelligence was treated merely as a parlor game, entering the first winter of its developmental journey.

3. Perceptrons and Neural Networks

Let’s return to Rosenblatt’s Perceptron[5]. It is essentially the prototype of modern neural networks, and whether it has scientific value is evidenced by today’s rapid development of AI. Of course, as the first generation of artificial intelligence machines, the Perceptron inevitably had various defects, and at that time, Rosenblatt had not yet managed to extend the learning algorithm of the Perceptron to multilayer neural networks. Neural networks vary from simple to complex, as shown in Figure 4. The Perceptron is merely the simplest single-layer neural network (left in Figure 4), while modern neural networks can have millions of (hidden) layers, as shown on the right in Figure 4.

Figure 4: Perceptron and Complex Neural Networks

However, Minsky believed the Perceptron’s flaws were fatal because it could not simulate “non-linearly separable” functions. He cited the example of a logic gate: the XOR gate, which the Perceptron cannot distinguish. Below is a brief introduction to this.

The simple model of a Perceptron neuron is shown in the left diagram of Figure 4: multiple inputs and a single output. The output function is obtained by taking the inner product of the input vector and weight vector, followed by an activation function to produce a scalar result.

Why can neural networks classify? One reason is due to the contribution of the activation function. For example, the simplest activation function is a step function, which outputs 0 or 1, meaning this function implements classification: dividing results into two categories.

As for when to output 0 and when to output 1? This decision is based on the input values. For example, one might ask three questions to determine whether it’s a cat or a dog: Are the ears up or down? Is the mouth protruding? Are the whiskers long or short? The simplest decision-making method is: if all three questions input “yes,” the output is a cat; otherwise, it’s a dog. However, the activation function can change from a step function to a smooth function, as shown by the red line in the lower right corner of the left diagram in Figure 4, which makes it easier to perform differential calculations during optimization, and the output can then be understood as the probability of deciding whether it’s a cat or a dog.

Why do neural networks have learning capabilities? This is because each input has a weight value, and these parameters are the core of the neural network. During training, the network adjusts these weights to minimize its error on specific tasks. This process of updating weights is what is known as “machine learning.” Minimization can be achieved using various optimization algorithms, such as the “gradient descent method” used in the Perceptron.

As shown in the formula above the left diagram in Figure 4, the summation function calculated for output is a hyperplane in n-dimensional space. Therefore, the essence of the Perceptron neural network’s “classification” is that this hyperplane divides the space into two parts. For a neural network with two input terminals, it divides the plane into two parts with a straight line, as shown in the linearly separable case in Figure 5b.

Figure 5: Perceptron Classification, Linearly Separable and Non-Separable

However, if the input samples are linearly inseparable (right side of Figure 5b), the Perceptron cannot simulate this situation. This is the defect that Minsky pointed out regarding the Perceptron.

Figure 6 shows several basic logic gates, where a single-layer Perceptron can be used to distinguish three of them: logic AND, logic NAND, and logic OR, but it cannot simulate the logic XOR function because it is non-linearly separable.

Figure 6: Logic Gates; the first three are linearly separable, XOR is non-linearly separable

To solve the non-linearly separable problem, one might consider using multi-layer functional neural networks. The layer of neurons between the output layer and the input layer is called a hidden layer, and both hidden layer and output layer neurons are functional neurons with activation functions. In the left diagram of Figure 7, the Perceptron’s neurons have no hidden layers, and the decision calculation generates only one straight line, unable to distinguish the XOR problem. But if a hidden layer with a non-linear activation function is added, it can solve this. The non-linearity of the hidden layer’s output activation function also helps to address the non-linearly separable problem. Adding one hidden layer increases the spatial dimension, as shown on the right side of Figure 6, forming a single hidden layer neural network, allowing the decision calculation to generate two straight lines and distinguish the XOR problem.

Figure 7: Adding a Hidden Layer to Solve the XOR Problem of the Perceptron

For multi-hidden-layer neural networks, there is also a “universal approximation theorem,” which means that a multi-layer neural network using S-shaped functions as activation functions can approximate any complex function and achieve any desired precision.

In summary, starting from the 1980s and 1990s, connectionism re-emerged, and research on neural networks returned to the mainstream. Many believe that Rosenblatt’s theories have been proven correct. The simple Perceptron has its flaws, but its basic principles sparked the modern artificial intelligence revolution. Deep learning and neural networks are changing our society, and understanding the Perceptron and the rise and fall of neural networks helps us better recognize AI and the future of AI development.

References:（Scroll down to browse）

[1]Rosenblatt, Frank (1962). “A Description of the Tobermory Perceptron.” Cognitive Research Program. Report No. 4. Collected Technical Papers, Vol. 2. Edited by Frank Rosenblatt. Ithaca, NY: Cornell University.

[2]Minsky, M. L. and Papert, S. A. 1969. Perceptrons. Cambridge, MA: MIT Press.

[3]https://en.wikipedia.org/wiki/Frank_Rosenblatt

[4]Popular Science China: The Three Major Schools of Artificial Intelligence

https://www.kepuchina.cn/zt/salon/tsrgzn/201901/t20190123_924578.shtml

[5]Wikipedia: Perceptron

https://zh.wikipedia.org/wiki/%E6%84%9F%E7%9F%A5%E5%99%A8

Follow“The Intellectual” Video Channel

Get more interesting and informative popular science content

Leave a Comment Cancel reply