Produced by Big Data Digest
Compiled by: Andy
The backpropagation algorithm belongs to deep learning and plays an important role in solving model optimization problems.
This algorithm was proposed by Geoffrey Hinton, known as the father of deep learning. In 1986, he published a paper titled “Learning representations by back-propagating errors” (Rumelhart, Hinton & Williams, Nature, 1986), which has been cited nearly 16,000 times to date, making it a seminal paper in neural network research.
However, the academic community has mixed feelings about this paper representing the backpropagation algorithm. Critics argue that it is somewhat dull.
The author of this article, Takashi J OZAKI, a data scientist at Google’s Business Insight team, believes that the significance of the 1986 paper is not only in proposing backpropagation but also in marking a major turning point where “neural networks separated from psychology and physiology and shifted towards the field of machine learning.”
He begins with this paper to detail his views on the backpropagation algorithm.
The following text is a first-person narrative by the author discussing the significance of the backpropagation algorithm. Enjoy!
Simple Structure of Backpropagation
Recently, @tmaehara said on Twitter:
I really don’t understand what the innovation of the original paper on backpropagation is… – The original paper just seems to differentiate and then add the gradient method using the chain rule…
Original paper:
http://elderlab.yorku.ca/~elder/teaching/cosc6390psyc6225/readings/hinton%201986.pdf
The paper “Learning representations by back-propagating errors” (Rumelhart, Hinton & Williams, Nature, 1986) has been cited nearly 16,000 times on Google Scholar, making it a pinnacle paper in neural network research.
It introduced the seemingly strange backpropagation method into Rosenblatt’s perceptron, which was previously unable to perform nonlinear separations, allowing it to perform nonlinear separations. It can be said to be a fundamental breakthrough in research.
Today, the neural networks now referred to as deep learning are entirely dominated by backpropagation. Thus, one could say that without this paper, there would be no current deep learning craze.
In other words, the current AI boom arguably began with this small paper over 30 years ago.
However, if you actually read this paper, you will find, as @tmaehara pointed out, that it is quite dull. Just like many neural network textbooks, it lightly discusses using the chain rule to differentiate and then optimizing using gradient methods.
At this point, you might be very puzzled, “Why does this dull paper occupy the pinnacle?”
“Who Invented Backpropagation?”
Schmidhuber wrote in his criticism that similar ideas had been proposed in other fields as early as the 1950s to 60s, and even in the 70s, there were algorithm implementations in FORTRAN. However, Rumelhart and Hinton’s team did not even cite these studies, showing a lack of basic respect. This criticism is indeed somewhat persuasive.
However, when considering the trajectory from the earliest perceptron to backpropagation and then to deep learning, we must realize that it was a “world completely different from what we consider neural networks today”.
From this perspective, we can see that the significance of the 1986 paper is not merely in proposing backpropagation but also in marking a “major turning point where neural networks separated from psychology and physiology and shifted towards the field of machine learning.” Below, I will briefly explain based on some knowledge I have acquired.
-
Neural networks of the past belonged to psychology and physiology.
-
The 1986 Nature paper signifies the separation of neural networks from psychology and physiology.
-
Thus, it shifted towards “pattern recognition” and “machine learning”, later evolving into deep learning.
-
Today, neural networks still continue to use the term “Neural”.
Neural Networks of the Past Belonged to Psychology and Physiology
Regarding connectionism, many young people may not know that when neural networks were mentioned in the past, they were associated with the research of psychologists and physiologists, as well as the basic theories of “connectionism”. Reading the Wikipedia entry on connectionism makes it clear that neural networks were originally established with the premise of “simulating the signal processing of the human brain”.
Its methodology is based on the unique “black-boxing” of the human body to some extent in cognitive science, and uses psychological models to explain the nature of human brain functions. Simply put, this is the psychological aspect supporting neural networks.
On the other hand, there is also a physiological aspect, referred to as “computational neuroscience”, which studies the connections related to the “human brain”. This can also be understood by reading the Wikipedia entry, which explains how physiological models attempt to explain human brain functions. If connectionism and neural networks represent an abstract conceptual model, then computational neuroscience attempts to argue the extent to which the human brain resembles (or does not resemble) neural networks.
Neural computation does not only consider models like neural networks; it also studies topics like what information neuron discharge patterns represent, which can be categorized as “neural activity interpretation”. It is important to point out that computational neuroscience is indeed a vast field. In short, this is the physiological aspect supporting neural networks.
In fact, regardless of the above situations, they all, to some extent, assume that “neural networks = imitation of the human brain”. In my view, at least until around 2000, many still believed that the relationship between neural networks and the human brain was inseparable. However, the impression I have of neural networks is that they originated from physiology or psychology and were then reinterpreted from the perspective of engineering and computer science.
For instance, the perceptron, which is the prototype of neural networks, was initially thought to be a learning model based on formal neurons in the human brain (especially the cerebellum). Its structure indeed resembles the physiological structure of the actual cerebellum and is still recorded as an important example of connectionism and computational neuroscience in many documents, such as Wikipedia.
By the way, the now-famous top conference NIPS, which has become an annual summit for machine learning, was primarily a conference on computational neuroscience in the past.
For example, looking at the records from the 1990 NIPS, they are absolutely unimaginable today, as most of them were about real brain research. In fact, when I was a rookie researcher at RIKEN BSI (RIKEN Brain Science Institute), one of the research themes in the lab was “synchronization of neuronal discharge activity”, so I have some understanding of this area. I found that modeling papers on neuronal discharge activity were also present in the records from NIPS back then. Thus, it is evident that neural networks of the past had a strong flavor of psychology (cognitive science) and physiology.
Consequently, it was inevitable that when researching neural networks in the past, there was always a thought that “no matter what new method, it must adhere to psychology or physiology”, which unconsciously formed a constraint.
Of course, this is just my personal speculation. However, even when I was still a rookie researcher in neuroscience, I had this mindset. While reading various computational theoretical model papers on human cognitive functions, I would think, “Hmm, this paper has a physiological basis, so it is reliable.” Even the great Zeki, who studies visual neuroscience, harshly criticized in his classic work, “David Marr’s model of visual information processing lacks physiological basis, so it is useless.” This shows the emphasis on the presence or absence of physiological evidence at that time.
Additionally, if you read the classic book from my youth, “The Computational Theory of the Brain”, you will deeply understand the atmosphere of that time. However, this book is now out of print and hard to find.
The 1986 Nature Paper Signifies the Separation of Neural Networks from Psychology and Physiology
At a time when neural networks and psychology and physiology were inseparable, the aforementioned 1986 Nature paper emerged, pulling neural networks out from that context.
Those who have some understanding of neuroscience may know that even in neuroscience, discussions about the significance of “feedback signals from higher-order areas to lower-order areas in the cortex” regarding equivalent backpropagation were not prevalent until around 2000. Rewind 14 years from 2000, and people discussing backpropagation would likely have been seen as odd.
Thus, in a context where “this thing has no physiological basis no matter how you look at it”, proposing backpropagation, essentially ignoring physiological evidence, while introducing a backpropagation mechanism with a paradigm shift and demonstrating its powerful practicality, is, in my opinion, the greatest significance of the 1986 Nature paper.
In the concluding part, the Nature paper provided the following summary:
This learning method does not seem to be a reasonable model of how the brain learns. However, applying this method to a variety of tasks shows that interesting internal representations can be constructed through gradient descent in weight space. This suggests that it is worthwhile to find more physiologically feasible methods for gradient descent in neural networks.
Here it frankly states, “Although backpropagation lacks physiological basis, it is very useful.” From another perspective, this might be the first step for neural networks to break free from the shackles of needing psychological or physiological evidence.
Moreover, the important viewpoint was proposed not by “outsiders” like applied mathematics or computer engineering, but by Rumelhart and Hinton, who are pillars in connectionism, making it even more significant. Additionally, although Hinton is now regarded as one of the three founders of deep learning and a giant in neural networks and machine learning, a look at Wikipedia shows that he was originally a cognitive psychologist (as was Rumelhart).
Of course, this does not mean that neural networks completely separated from psychology and physiology afterward. At least looking at the literature, it seems that until around 2000, they were still included in connectionism. However, it is clear that the trend is for neural networks to no longer necessarily rely on psychology and physiology, and in the future, they will further separate from psychology, physiology, and connectionism.
Shifting Towards “Pattern Recognition” and “Machine Learning,” Later Evolving into Deep Learning
Although I was a computer science student in my undergraduate studies, I remember in 1999 that the term “machine learning” was not commonly used. The term “pattern recognition” was used more frequently. As a reference, I looked at the Wikipedia entry for “Pattern Recognition” and found that most of the citations were indeed from around the year 2000, which aligns with my impression at the time.
By the way, this is a classic textbook I used in my undergraduate courses (“Easy to Understand Pattern Recognition”). Looking back, it seems that the sequel to this book is more famous, but this book still covers the basics of pattern recognition and is a good introductory book.
This book also discusses neural networks and backpropagation. Additionally, when neural networks were mentioned in neuroscience and other textbooks, they would definitely cover the similarities and differences between the cerebellum and perceptrons, as well as the history of neural networks, but this book did not spend much time on this aspect.
Moreover, like many other famous textbooks, it primarily explains how to derive backpropagation based on the chain rule and how to implement it. By the way, when I first started my work as a data scientist, I took this book off the shelf, looked at the gradient method formula, and implemented the backpropagation algorithm from scratch.
As of around 2000, neural networks were considered a type of “pattern recognition” that had departed from the realms of psychology and physiology. Later, along with the success of “machine learning”, they were regarded as a type of statistical learning model, followed by Hinton’s Science paper in 2006, and through the ImageNet Challenge in 2012, neural networks and their ultimate form, deep learning, gradually secured their position as the king of machine learning.
Today, research in neural networks features a plethora of innovative network models, with the chaotic situation of “the network I came up with is the best”, constant introduction of cutting-edge optimization solutions, and advancements in large-scale parallel processing that were previously unimaginable, along with applications as mathematical theory analysis objects and as tools for discovering new properties of objects. Neural networks have become a grand unification of the essence of various engineering and computer science schools.
By the way, the Japanese version of the Wikipedia entry for “Neural Networks” also contains similar descriptions.
Neural Network (NN) is a mathematical model designed to simulate certain features of brain function through computer modeling. Although the research originated from modeling biological brains, the differences with brain models have gradually become significant due to changing perspectives on neuroscience. To distinguish from biological and neuroscientific neural networks, they are also referred to as artificial neural networks.
Artificial Neural Networks (ANN) or connectionist systems are computational systems roughly inspired by the biological neural networks that constitute animal brains. The initial purpose of artificial neural networks was to solve problems like the human brain. However, over time, the focus has shifted to performing specific tasks, leading to deviations from biology.
Here it explicitly expresses a view similar to “neural networks have deviated from the human brain itself”. Thirty-two years after the 1986 Nature paper, neural networks have gradually distanced themselves from the origins of connectionism, beginning to be recognized as the king of machine learning.
Although I have repeated this statement many times, it is worth repeating: “The greatest significance of the 1986 Nature paper was to create the opportunity for neural networks to separate from psychology and physiology and shift towards machine learning.” This is the conclusion I have drawn after reviewing the history of neural networks.
Today, Neural Networks Continue to Use the Term “Neural”
By the way, the NIPS conference, which was once the top discussion venue for neural information processing, has recently undergone a massive transformation in its description on the English Wikipedia.
The conference had 5,000 registered participants in 2016 and 8,000 in 2017, making it the largest conference in the field of artificial intelligence. In addition to machine learning and neuroscience, NIPS also covers other fields, including cognitive science, psychology, computer vision, statistical linguistics, and information theory.
Although the “neural” in the NIPS acronym has become a historical remnant, the revival of deep learning after 2012, along with the advancements in high-speed computing and big data, has achieved remarkable results in areas such as speech recognition, image object recognition, and language translation.
It is stated bluntly that the “neural” in the NIPS acronym has become a historical remnant. In fact, when I attended NIPS in Lake Tahoe in 2013, while there were still some studies related to neural activity signal data analysis, it seemed that the term “neural” still had some presence, but more research was related to deep learning.
Since then, it has become even more prevalent.
The world champion-level performance in Go is based on a neural network architecture inspired by the hierarchical structure of the visual cortex (ConvNet) and reinforcement learning inspired by the foundational neural nodes (temporal difference learning).
Moreover, “despite this, there is still significant application potential for brain-inspired neural networks (and their comprehensive use).” I agree with this point. Furthermore, I believe that because of this, even if imitating the human brain is no longer the goal, neural networks continue to use the term “neural”.
In Conclusion
Regarding the situation before 2000, much of it is derived from my experiences in both neuroscience and computer science, as well as various references in the literature. Of course, I did not personally experience these events. If you find any errors in the explanations or relationships of events in the article, I would be very happy if you could point them out m(_ _)m.
References
*1: Although I think Hinton’s 2006 Science paper and the 2012 ImageNet Challenge are also indispensable.
*2: However, there are quite a few books that I no longer have on hand.
*3: However, this article has a somewhat malicious tone, so please handle it with care.
*4: The cerebellar perceptron theory was proposed by the genius David Marr, who died at 35, and is often told as a famous success story.
*5: For example, this one: https://papers.nips.cc/paper/421-analog-computation-at-a-critical-point-a-novel-function-for-neuronal-oscillations.pdf
*6: It is indeed about the 2 and 1/2 dimensional sketch model.
*7: Especially the functional neuroanatomy of areas known to have hierarchical structures such as various sensory areas.
*8: Lamme and Roelfsema (Trends Neurosci., 2000) is something I repeatedly read during my undergraduate thesis: http://www.kylemathewson.com/wp-content/uploads/2010/03/LammeRoelfsema-2000-TiN-Reentrant-Vision.pdf
*9: For example, review papers on the relationship between neuronal activity in cortical areas and feedback projections between hierarchical areas remain until around 2000: https://www.ncbi.nlm.nih.gov/pubmed/?term=review+%5Bptyp%5D+feedback+%5Bti%5D+neuron+cortex
*10: This was the main text for a certain study group.
*11: However, it does not seem to be as easy to understand as the title suggests…
*12: However, this is only mentioned in the context of a coffee break, where the history of Minsky’s criticism of the perceptron and Rumelhart’s proposal for BP is briefly discussed.
*13: For example, the fact that Vapnik first proposed SVM in 1963 and that the extension to non-linear SVM was made in 1992 indicates that the seeds of “machine learning” as an engineering field had actually sprouted long before.
Related reports:
https://tjo.hatenablog.com/entry/2018/10/23/080000
[Today’s Machine Learning Concept]
Have a Great Definition
Volunteer Introduction
Reply “Volunteer” to Join Us