The Father of Recurrent Neural Networks: Building Unsupervised General Neural Network AI

Recommended by New Intelligence

Source: Authorized Reprint from InfoQ

Translator: He Wuyu

[New Intelligence Overview] Jürgen Schmidhuber, the scientific affairs director at the Swiss AI lab IDSIA, led a team in 1997 to propose the Long Short-Term Memory Recurrent Neural Network (LSTM RNN), which simplifies time-dependent recurrent neural networks, thus earning him the title of “Father of Recurrent Neural Networks”. In an interview with InfoQ, Schmidhuber shared his views on the trends in deep learning and artificial intelligence development. His long-standing goal has been to “create an AI smarter than myself”. Currently, his new startup is also dedicated to researching general neural network AI, and to achieve this goal, Schmidhuber believes that more than just ordinary deep learning is needed.

Recently, machine learning has become a buzzword in the media. Recently, the journal “Science” published a cover article titled “Human-level concept learning through probabilistic program induction.” Shortly thereafter, “Nature” magazine reported on AlphaGo, the AI program that defeated the European Go champion.

Currently, many people are discussing the potential of artificial intelligence, raising various questions such as “Can machines learn like humans?” and “Will AI surpass human intelligence?” and so on.

To answer these questions, InfoQ interviewed Professor Jürgen Schmidhuber, the scientific affairs director at the Swiss AI lab IDSIA. He shared more information about the latest trends and developments in deep learning and artificial intelligence.

The Father of Recurrent Neural Networks: Building Unsupervised General Neural Network AI

Professor Jürgen Schmidhuber is the scientific affairs director at the Swiss AI lab IDSIA and teaches at the University of Lugano and the Southern Switzerland University of Applied Sciences and Arts. He obtained his bachelor’s and doctoral degrees in computer science from the Technical University of Munich in 1987 and 1991, respectively.

Since 1987, he has been leading research on self-improving general problem-solving programs. Since 1991, he has been a pioneer in the field of deep learning neural networks. His research team at IDSIA and the Technical University of Munich developed a recurrent neural network and won the first official international competition. These technologies have revolutionized continuous handwriting recognition, speech recognition, machine translation, and image captioning, and are now used by Google, Microsoft, IBM, Baidu, and many other companies. DeepMind has been greatly influenced by his former PhD students.

Since 2009, Professor Schmidhuber has been a member of the European Academy of Sciences and Arts. He has received many awards, including the Helmholtz Award from the International Neural Networks Society in 2013 and the Neural Networks Pioneer Award from the Institute of Electrical and Electronics Engineers in 2016. In 2014, he co-founded the AI company NNAISENSE, aiming to create the first practical general AI.

Deep Learning Is Old Wine in New Bottles

What is deep learning? What is its history?

Schmidhuber: It is old wine in new bottles. It mainly concerns deep neural networks with multiple subsequent processing layers. With today’s faster computers, such networks have completely changed pattern recognition and machine learning. The term “deep learning” was first introduced into the field of machine learning by Dechter in 1986, and then by Aizenberg et al. in 2000 into the field of artificial neural networks.

The father of deep learning is the Ukrainian mathematician Ivakhnenko. In 1965, he co-authored the first general feasible learning algorithm for supervised feedforward multilayer perceptrons with Lapa. By 1971, he had already described an eight-layer network (which, even by today’s standards, is still considered deep) and trained it using a method that remains popular in the new millennium. He was far ahead of his time—computers were a billion times slower than they are now.

The Father of Recurrent Neural Networks: Building Unsupervised General Neural Network AI

What do you think about the paper on human-level concept learning in “Science” magazine, which achieved “one-shot learning” through a Bayesian program learning framework?

Schmidhuber: That paper is interesting. However, we can also achieve fast one-shot learning through standard transfer learning. The method is to first “slowly” train a deep neural network based on many different visual training sets, making the first 10 layers of the network a fairly general visual preprocessor;

Then, freeze these 10 layers and retrain only the 11th layer on new images with a high learning rate. This approach has worked well for years.

Marcus Hutter

What are the similarities and differences between Bayesian methods and deep learning methods? Which method is more feasible? Why?

Schmidhuber: The ultimate optimized version of machine learning, Bayesian methods, is embodied in the AIXI model proposed by Marcus Hutter in 2002. He was my postdoc student and is now a professor. Any computational problem can be framed as a maximization problem of a reward function.

The AIXI model is based on Solomonoff’s universal induction model M, which encompasses all computable probability distributions. If the world’s response probabilities to some reinforcement learning agent’s actions are computable (there’s no evidence to refute this), then the agent might use M (instead of some other accurate but unknown distribution) to predict its future sensory inputs and rewards, optimizing its actions by selecting the sequence of actions that maximizes M’s predicted rewards.

This could be considered the ultimate statistical method of artificial intelligence—showing the mathematical limits of possibilities. However, the optimization concept of AIXI ignores the computational aspect of time, which is why we still use less general but more feasible methods, like deep learning based on more constrained local search techniques (such as gradient descent).

The paper in “Science” magazine states that its results “passed the visual Turing test”. Is the Turing test, proposed more than half a century ago, still valid today?

Schmidhuber: Does the entity I’m chatting with seem human to me? If so, then it has passed my personal Turing test. The main issue with this test is its subjectivity, as Weizenbaum demonstrated decades ago. Some people are always easier to fool than others.

Weizenbaum

Connection with DeepMind

What do you think about the paper from Google DeepMind in “Nature” magazine about AlphaGo, a program that defeated professional Go players? Is AlphaGo a major breakthrough in this field? What helped AlphaGo achieve such success?

Schmidhuber: I am happy about Google DeepMind’s success, also because the company is greatly influenced by my former students: two of the initial four members of DeepMind came from IDSIA, one is a co-founder and the other was the first employee, who were also the company’s earliest AI PhDs; later, several of my other PhD students joined DeepMind, one of whom co-authored a paper on “playing Go” with me in 2010.

In the board game Go, the “Markov assumption” holds: in principle, the current input (i.e., the board state) contains all the information needed to determine the best action for the next step, without considering previous states.

That is to say, the game can be handled through traditional reinforcement learning. This is somewhat similar to 20 years ago when IBM’s Tesauro learned from scratch using reinforcement learning to create a backgammon program that could compete with human world champions.

However, nowadays, we benefit from the fact that computers are at least 10,000 times faster per unit cost than before. Over the past few years, automated Go programs have significantly improved. To learn to become a good Go player, DeepMind’s system combined various traditional methods, such as supervised learning (learning from human experts) and reinforcement learning based on Monte Carlo tree search.

However, unfortunately, the conditions of the “Markov assumption” do not hold in real-world scenarios. That’s why real-world games (like soccer) are much harder than chess or Go, and for this reason, reinforcement learning robots with strong AI living in partially observable environments require more complex learning algorithms, such as reinforcement learning for recurrent neural networks.

Recently, Google DeepMind announced its entry into the healthcare market. What are your thoughts on this?

Schmidhuber: We are very interested in the application of deep learning in the medical field. In fact, in 2012, the IDSIA team developed the first deep learning program to win a medical imaging competition.

Seeing many companies now using deep learning for medical imaging and similar fields makes me very happy. Globally, healthcare spending accounts for over 10% of GDP (over $7 trillion annually), most of which is spent on medical diagnostics.

Automating part of this process could save billions of dollars and enable many people who currently cannot afford it to receive expert-level medical diagnoses. In this context, perhaps the most valuable asset for hospitals is their data—which is why IBM spent $1 billion on a company that collects such data.

What do you think of IBM’s new Watson IoT platform? What potential does AI have in the IoT field? Will “AI as a Service” become a promising trend for AI?

Schmidhuber: The scale of IoT will be much larger than that of humans, as the number of machines far exceeds that of humans. Moreover, many machines will indeed provide “AI services” to other machines. Advertising has commercialized human networking, but the business model for IoT doesn’t seem so clear.

Machines Cannot Yet Learn Like Humans, But They Are Getting There

Some say the future belongs to unsupervised learning. Do you agree?

Schmidhuber: I would say the past also belonged to unsupervised learning, which is about detecting patterns in observed phenomena without teacher supervision, which is essentially adaptive data compression, such as through predictive coding. I published the first paper on this topic 25 years ago—in fact, this paper led to the first operational “extremely deep learning program” in 1991, capable of handling hundreds of subsequent computation layers.

Can machines learn like humans?

Schmidhuber: Not yet, but they might be getting there soon.

Everyone can also check out this article on “learning to think”: unsupervised data compression is a core component of adaptive agents based on recurrent neural networks, which can use predictive models based on recurrent neural networks to better plan and achieve goals. We first published a paper on this in 1990 and have made significant progress since then.

Are there limitations to artificial intelligence?

Schmidhuber: If we talk about limitations, they are basically the limitations of computability proposed by Kurt Gödel, the father of theoretical computer science, 85 years ago (in 1931).

Einstein with Gödel

Gödel’s research shows that traditional mathematics either has flaws in some algorithmic sense or contains true statements that cannot be proved by any computational program (whether human or AI).

What Is Next?

In your view, what is the ideal division of labor between humans and computers?

Schmidhuber: Humans should be freed from all heavy and tedious work, leaving the rest to computers.

You are well-known for your pioneering work in recurrent neural networks, especially the long short-term memory networks that are widely used in deep learning today. Can you provide a brief background and technical description of long short-term memory networks? In which fields do you think long short-term memory networks are most applicable? Are there real-world examples?

Schmidhuber: Supervised long short-term memory recurrent neural networks are universal computers that can learn parallel sequence programs and handle various segments, including video and speech.

Since the early 1990s, my lab has developed this type of network. Some components of long short-term memory recurrent neural networks have been specially designed so that the backpropagation of errors neither vanishes nor explodes, but flows back in a “civilized” manner over thousands or more steps.

Thus, variants of long short-term memory networks can learn previously unlearnable “extremely deep learning” tasks that require discovering (and remembering) the significance of events that occurred thousands of discrete time steps earlier, whereas standard recurrent neural networks struggle with even a minimum delay of ten steps. It may even push forward long short-term memory network-style topologies for specific problems.

Around 2007, long short-term memory networks trained through our connectionist temporal classification method began to revolutionize speech recognition, outperforming traditional methods in keyword recognition tasks.

Later, long short-term memory networks also helped Google improve image recognition, machine translation, text-to-speech synthesis, syntactic analysis in natural language processing, and many other applications. In 2015, long short-term memory networks trained via connectionist temporal classification significantly improved Google Voice (with a performance increase of 49%), now benefiting over 1 billion smartphone users. Additionally, Microsoft, IBM, and many other well-known companies are also extensively using long short-term memory networks.

Your team has won nine international pattern recognition competitions, such as handwriting recognition and traffic sign recognition, etc. How did you achieve this?

Schmidhuber: We are indeed proud of winning so many competitions, which include:

MICCAI 2013 Mitosis Detection Challenge
ICPR 2012 Breast Cancer Histopathology Mitosis Detection Challenge
ISBI 2012 Brain Image Segmentation Challenge
IJCNN 2011 Traffic Sign Recognition Challenge
ICDAR 2011 Offline Chinese Calligraphy Challenge
Online German Traffic Sign Recognition Challenge
ICDAR 2009 Arabic Handwriting Recognition Challenge
ICDAR 2009 Persian/Arabic Character Handwriting Recognition Challenge
ICDAR 2009 French Handwriting Recognition Challenge

How did our team achieve this? With creativity, persistence, hard work, and dedication.

You also emphasized the importance of extremely deep networks, right?

Schmidhuber: Since depth means computational power and efficiency, we have focused on extremely deep neural networks from the very beginning. For example, until the early 1990s, others were still limited to relatively shallow networks (with fewer than 10 subsequent computational layers), while our approach enabled over 1000 such computational layers.

It can be said that it was our ability to make neural networks extremely deep, especially recurrent networks, that are the deepest and most powerful networks. At that time, almost no researchers were interested in this, but we persisted.

With the decreasing cost of computational power, winning competitions this way was just a matter of time. Nowadays, I am pleased to see other deep learning labs and companies also extensively using our algorithms.

The competitions mentioned earlier are all about pattern recognition—what methods do you recommend for reinforcement learning and more general fields of unsupervised sequential decision-making?

Schmidhuber: We prefer our compression grid search, which transcends mere pattern recognition, discovering complex neural controllers with a million synaptic weights. In 2012, it became the first method to learn control policies directly from high-dimensional sensory input using reinforcement learning.

The Father of Recurrent Neural Networks: Building Unsupervised General Neural Network AI

What are your latest research interests in deep learning or artificial intelligence?

Schmidhuber: My latest research interest remains those I identified in the early 1980s: “to create an AI smarter than myself, so I can retire.” To achieve this goal, we need more than just ordinary deep learning.

It requires self-referential general learning algorithms that can not only improve the performance of a system in a particular domain but also improve how they learn and how to learn by themselves, etc., all constrained only by fundamental limits of computability. Since I published my thesis on this topic in 1987, I have been researching this all-encompassing field. However, now I see it starting to turn from fantasy into reality.

As a deep learning startup, NNAISENSE has gained attention since its establishment last year. As the president of this company, can you tell us more about NNAISENSE? What plans do you have for it?

Schmidhuber: NNAISENSE is pronounced similarly to “nascence” because it primarily studies general neural network AI (NNAI), which is something entirely new. The company has five co-founders, several employees, and a very strong research team.

Our revenue comes from continuously launching cutting-edge applications for the industrial and financial sectors, and we are also in talks with investors. We believe we can achieve breakthroughs that will change everything, realizing the ideal I set in the 1980s: “to create an AI smarter than myself, so I can retire.”

What developments do you foresee in the field of artificial intelligence in the near future? Where will new killer applications emerge? Will there be bottlenecks?

Schmidhuber: I pointed out in an AMA on Reddit that even existing machine learning and neural network algorithms (with a little extension) can achieve significant progress in multiple fields that humans cannot reach, including medical diagnostics and smarter smartphones—that can understand you better, solve more problems for you, and make you more addicted to it.

I think we are witnessing the ignition phase of explosive growth in this field. However, how can we predict the chaotic details of an explosion? Suppose computational power becomes cheaper, for example, reducing by 100 times per unit cost every ten years; by 2036, the computational power that can be purchased for the same price will be 10,000 times faster than today.

This sounds somewhat like a small portable device could have computational power equivalent to a human brain; or a larger computer could accommodate computational power equivalent to all the human brains in a city.

Given such powerful computational capabilities, I expect that massive recurrent neural networks running on proprietary hardware will be able to simultaneously perceive and analyze multimodal data streams from multiple sources (voice, text, video, and other modalities), learning to connect all input information and utilizing the extracted information to achieve various commercial and non-commercial goals.

Based on what has been learned, those recurrent neural networks will continuously and rapidly learn new skills. This should give rise to countless applications, although I can’t even be sure if the term “applications” will still hold meaning by then.

So, what is next?

Schmidhuber: Compared to the intelligence of small children and even some small animals, our best self-learning robots still lag far behind.

But I believe that in a few years, we will be able to build a neural network-based AI (i.e., NNAI) that can become as intelligent as small animals through incremental learning and learn how to plan, reason, and decompose a series of problems into quickly solvable (or already solved) subproblems in an extremely general way. Through our “theory of fun,” it may even have curiosity and creativity, becoming an unsupervised artificial scientist.

Once we have animal-level AI, what will happen?

Schmidhuber: Then achieving human-level AI may not be so difficult: Earth took billions of years to evolve intelligent animals, but it took only a few million years to evolve humans from that foundation. The speed of technological evolution far exceeds that of biological evolution.

In other words, once we achieve animal-level AI, we may achieve human-level AI in a few years or decades. At that time, various applications will truly be limitless, all businesses will change, all civilizations will change, and everything will change.

What will artificial intelligence look like in the long-term future?

Schmidhuber: Superintelligent AI may soon colonize the solar system and, within millions of years, colonize the entire galaxy. The universe will take the next step toward ever more unfathomable complexity.

(Source: InfoQ)

The Father of Recurrent Neural Networks: Building Unsupervised General Neural Network AI

New Intelligence 616 Reward Submission Notice

Starting from May 31, New Intelligence will open the [Master Column], publishing original articles written by AI experts for New Intelligence.

New Intelligence is committed to promoting the development of the AI industry, technological research and progress, and has a strong influence in the AI industry and academia. To further enrich the exchange and sharing of thoughts and technologies in the AI industry, we are seeking submissions from industry leaders and experts:

1. For submissions from industry and academia leaders and experts to New Intelligence’s public platform, original articles marked as the master column will be adopted as the headline of the public account, with a manuscript fee standard of 3000 yuan or more;

2. For submissions from industry and academia leaders and experts to New Intelligence’s public platform, original articles marked as the master column will be adopted as 2-3 articles in the public account, with a manuscript fee standard of 300 yuan per thousand words or more;

3. For submissions from industry and academia elites and backbones to New Intelligence’s public platform, in-depth articles marked as industry and academic will be adopted as the headline of the public account, with a manuscript fee standard of 300 yuan per thousand words or more.

4. For articles already published in other media or personal blogs, submissions from experts or research institutions that are accepted and reprinted by New Intelligence will not be paid, but can include a personal resume, photo, or a brief introduction of the company or laboratory or a link to the original text.

[New Intelligence is Hiring Special Editors in the Field of Artificial Intelligence]

Any elite or backbone from industry and academia (generally requiring a PhD) who submits to New Intelligence’s public platform and is hired as a special editor for columns or industry and academic conferences will have a manuscript fee standard of 300 yuan per thousand words or more.

Please contact New Intelligence experts for submissions: [email protected]

New Intelligence expert submission WeChat: X1239828904

Click “Read the original text” to see New Intelligence recruitment information

Leave a Comment Cancel reply