What Are The First Principles Of Artificial Intelligence?

Click the aboveBeginner Learning Vision” to select “Star” or “Top

Heavy content delivered firstWhat Are The First Principles Of Artificial Intelligence?

Author: Guo Ping, Director of the Image Processing Research Center, Beijing Normal University

Source: Qingzhan Artificial Intelligence Research Institute, Turing Artificial Intelligence

This article is by Professor Guo Ping, who uses the “Four Questions” format to explain the first principles of artificial intelligence. It proposes a way to solve the lack of basic natural science knowledge in artificial intelligence based on physical principles by applying first principle thinking; and suggests using the principle of least action as the first principle of artificial intelligence.

What Are The First Principles Of Artificial Intelligence?

Achieving Artificial General Intelligence (AGI) is a long-term goal. We need to explore the path to artificial intelligence (AI) starting from basic research. “Basic research is the source of the entire scientific system and the general mechanism for all technical problems.” This also illustrates the significance and importance of AI theoretical research. We should strengthen the mathematical and physical foundations of AI, using the “first principles” as a starting point to develop a new generation of AI foundational theory.

Are there first principles in the field of artificial intelligence?

The ancient Greek philosopher Aristotle described first principles (or primary principles) as: “In exploring every system, there exist first principles, which are the most basic propositions or assumptions that cannot be omitted or violated.” Before the 20th century, first principles were mainly used in philosophy, mathematics, and theoretical physics. In mathematics, first principles are one or more axioms that cannot be derived from any other axioms within the system. In theoretical physics, first principles refer to calculations directly established from physical laws without making assumptions such as empirical models and fitting parameters. The first principle of biology is the theory of “survival of the fittest” proposed by Darwin. In modern society, first principles have expanded into many disciplines, including life sciences, chemistry, economics, and social sciences.

As human cognition has developed, first principles have differentiated from their original philosophical terminology into more specialized expressions, some of which no longer use the term “first principles” but instead use its synonyms. In philosophy, the term “a priori principle” is used, while in mathematics, the standardized term “axioms” is uniformly adopted, and in physics, “first principles” is retained.

Whether there are first principles in AI is a controversial topic. Some argue that AI lacks first principles, reasoning that first principles define the boundaries of the problem space within the domains defined by philosophical, mathematical, or physical rules, and the first principles of AI only become meaningful after clearly defining what “intelligence” is.

Currently, there is no clear definition of “intelligence,” and therefore no precise, universally accepted definition of AI exists. There are two definitions in academia to reference: one is from Professor Nils J. Nilsson of Stanford University’s Artificial Intelligence Research Center, which states that “AI is the science of knowledge—how to represent knowledge, how to acquire knowledge, and how to use knowledge.” The other is from Professor Patrick Winston of MIT, who stated that “AI is the study of how to make computers do tasks that, until recently, only humans could perform.”

Some argue that AI lacks first principles based on a statement in Professor Nilsson’s book “Principles of Artificial Intelligence” [1]. On page 2 of this book, there is a passage that clearly presents this concept: “AI currently lacks a universal theory, so I will show you some applications next.” This means that there are currently no first principles in AI; attention should be focused on principles related to engineering goals, which are derived principles. Derived principles essentially tell us some simple results of complex systems, whether natural or AI, and their essence might be the same. Intelligence is the result of many processes occurring in parallel and interacting, and these processes cannot easily be traced back to a fundamental physical principle.

We believe this view sees AI as a technology, approaching the problem from a technological perspective, meaning treating AI as a discipline built on experimental foundations.

Physicist Zhang Shoucheng mentioned the first principle thinking approach in a speech: before the 20th century, the concept of first principles belonged to disciplines logically self-consistent through human brain induction and deduction, including mathematics, philosophy, and theoretical physics, all of which can be clearly distinguished from experimental-based disciplines such as chemistry and biology.

In the 21st century, human cognitive levels and scientific technologies have changed significantly. In experimental-based disciplines, there are results based on first principles. For example, in biological sciences, first principles have been rediscovered. Recently, David Krakauer, the current director of the Santa Fe Institute, published an article titled “Individual Information Theory” in the journal Theory Bioscience, which presents a mathematical formalization theory based on first principles that can strictly define many different forms of individuals by capturing the flow of information from the past to the future. However, some have raised doubts: “The author attempts to provide a general framework for ‘from scratch calculating’ life, which is a grand ambition. Yet, providing a tuning parameter γ raises doubts about its ‘scientific stance.’”

It is normal to have different views on a viewpoint. The current consensus is that AI, dominated by deep learning, lacks theory. However, achieving AI is based on computer technology, which also developed science theory after the technology. ACM Turing Award winner Yann LeCun believes that theory is often constructed after inventions, such as the invention of the steam engine before thermodynamics and programmable computers before computer science, etc. With a theoretical foundation, even if only a conceptual basis, it will significantly accelerate research progress in the field.

Professor Nilsson’s book “Principles of Artificial Intelligence” has been published for over 40 years, and AI theory is still developing. Our cognitive levels have also improved, so we should reconsider whether first principles exist in AI. Academician Li Guojie believes that AI and computer science are essentially a discipline. AI systems are systems that process and handle information using computer technology. Since it is a system, by definition, there should be first principles in every system.

We know that machine learning is a subset of AI, and AI foundational research is based on mathematics and physics. Professor Yu Jian from Beijing Jiaotong University published a book “Machine Learning: From Axioms to Algorithms.” This is a book that studies learning algorithms based on axioms, essentially applying the first principles of mathematics to machine learning, albeit without explicitly stating it. Professor Yu Jian’s book can be seen as an example of applying first principles to machine learning.

Since physics is a fundamental science, many disciplines are based on physics. The first principles of physics can be applied to these disciplines. The first principles of physics are also known as “ab initio,” which means only using the most basic physical laws without using empirical parameters, only using a few experimental data such as electron mass, speed of light, proton and neutron mass to perform quantum calculations. Our research on physics-based AI can borrow the first principles of physics, applying “ab initio” to AI, which can be considered the first principles of AI. However, “ab initio” is a narrow first principle, while the broad first principle is the “least action principle.”

Why Base Artificial Intelligence on Physics?

Mathematics and physics are not only the foundation of other disciplines but also the foundation of AI. Why study the foundational theory of AI based on physics? This is because physics is the discipline that studies the most general laws of material motion and the basic structure of matter; it is the leading discipline of natural sciences, and the research foundations of all other natural science disciplines are built upon physics. Moreover, the relationship between philosophy and physics is also very close. The famous physicist Stephen Hawking boldly declared on the first page of his work “The Grand Design” that “philosophy is dead” because “philosophy cannot keep up with the pace of modern developments in science, especially physics. In our journey of exploring knowledge, scientists have become torchbearers.” Although this has been criticized as an extremely arrogant “declaration,” it also indicates that physics has promoted the development of philosophy.

Yann LeCun pointed out several shortcomings of current AI systems in his speech at IJCAI 2018 (International Joint Conference on Artificial Intelligence): lack of task-independent background knowledge, lack of common sense, lack of the ability to predict the consequences of behaviors, and lack of long-term planning and reasoning abilities. In short, it means there is no world model and no general background knowledge about how the world operates; we need to learn a world model that possesses common sense reasoning and predictive abilities. Therefore, future research on AI needs to form a new type of theory aimed at constructing a realizable world model. Some scholars also believe that to better describe neural networks and neural systems, we need a new mathematical language and framework, but where this new framework is remains an unresolved issue in academia. We believe that physics-based AI may be the most promising new framework to achieve this.

Regarding the issue of AI lacking common sense, the physics-based AI framework may provide a solution. To endow AI with common sense, we first need to clarify what common sense is. In simple terms, common sense is the general knowledge that most people know. According to the description from an online encyclopedia, general knowledge refers to the basic knowledge that a mentally sound person living in society should possess, including survival skills (self-care abilities), basic labor skills, and foundational knowledge of natural and human sciences. A more professional definition of common sense refers to the basic knowledge required for various jobs and academic research within relevant fields. This foundational knowledge comes from the generalization of natural laws, natural phenomena, or human social activities.

How to Enable Artificial Intelligence to Have Common Sense?

Yann LeCun has explained why AI lacks common sense: “We are unable to let machines learn a vast amount of background knowledge, whereas infants can acquire a massive amount of background knowledge about the world in the first few months after birth.” This means that for AI to master common sense, it must understand how the physical world operates and make reasonable decisions; they must be able to acquire a large amount of background knowledge and understand the laws of the world to make accurate predictions and plans. It is not difficult to see that this is essentially an inductive way of thinking. Most of our common sense is acquired through induction.

Why is it so difficult to endow AI with common sense? For decades, research in this area has made little progress, and one possible reason is that first principles have not been considered. When discussing AI’s lack of common sense, most scholars subconsciously assume that AI’s common sense includes foundational knowledge from all fields. In fact, common sense is domain-specific, including life common sense, basic labor skills, and foundational knowledge of natural sciences. If we aim to endow AI with all-encompassing, uncategorized common sense from the outset, it is clear that this aligns with the requirements of AGI. However, the mainstream AI academic community has never been directed towards AGI, and the development of existing technologies will not automatically make AGI possible. Currently, achievable goals are focused on certain specific types of intelligent behavior, known as “weak artificial intelligence.” In reality, we have every reason to believe that even if we can accurately observe and replicate the behavior of neural cells, we cannot restore the emergence of intelligent behavior. Therefore, by using first principle thinking, finding the most fundamental principles within complex phenomena is essential to solving fundamental problems. According to first principle thinking, we need to calculate from scratch, that is, first train AI to learn foundational natural science knowledge. This is the “baby learning” method proposed by Professor Yan Shuicheng of the National University of Singapore, which simulates the method of infants acquiring knowledge through self-learning step by step.

To enable AI to possess common sense, we need to simplify the complexity and limit common sense to specific domains, such as making the mastery of physical science common sense a primary goal at this stage. Using first principle thinking, we can instill physics-based scientific common sense into AI. Therefore, we need to shift our thinking from pure data processing logic to some form of “common sense,” starting from basic physical principles, allowing AI to first master scientific common sense and then learn reasoning.

Why focus on enabling AI to learn foundational natural science common sense rather than life common sense or common sense from other fields? The physical principles underlying foundational natural science common sense are clearly defined and can be described by mathematical formulas. First principles derive the current state of things from a few axioms, while physical laws are often described using partial differential equations. Newton’s “Mathematical Principles of Natural Philosophy” defined a set of basic concepts for classical mechanics, proposing the three laws of motion and the law of universal gravitation, thus making classical mechanics a complete theoretical system. Starting from physical laws, using Newtonian mechanics formulas to derive various motion phenomena can at least enable AI to possess scientific common sense that can explain natural phenomena using classical mechanics.

In fact, there have already been precedents in this area. The best paper at AAAI 2017, “Neural Networks Without Labels Supervised by Physics and Domain Knowledge,” calculates the trajectory of a pillow based on the law of universal gravitation, using the constraint that the network’s output must satisfy physical laws to train the neural network, thus achieving unsupervised learning of the neural network. The common sense here is: if an object is not acted upon by other forces, such as the support force of a table, it will perform free fall motion under the influence of gravity. Our IJCNN 2017 paper essentially also achieved unsupervised learning of neural networks for spectral image correction based on the Huygens-Fresnel imaging principle.

Building a world model based on first principle thinking requires more effort, and constructing a world model based on first principles may require more computational power than imitative calculations. On one hand, we currently do not have enough computing power for machines to learn vast background knowledge, but limiting it to foundational natural science background knowledge is still possible. Recent literature indicates that GPT-3 (the third version of the Generative Pre-training Transformer language model released by OpenAI in May 2020) has 175 billion parameters and uses a dataset capacity of 45TB, demonstrating that computing power has significantly improved. On the other hand, using physical thinking to make reasonable approximations simplifies the complexity of the problem, reducing intractable problems to tractable ones. For example, approximating many-body problems to two-body problems based on mean-field theory. Mathematicians always seek precise solutions to problems, while physicists adopt approximation methods when precise solutions are unattainable. Therefore, it is said that mathematicians tend to complicate simple problems, while physicists strive to simplify complex problems. If we ask why we should study physics-based AI, this could be one reason.

Pursuing harmony, unity, and perfection is the highest realm of physicists, which is also the realm pursued by AI scientists and all scientists. The first principles of AI should exemplify this pursuit of perfection. The least action principle in physics is a very simple and elegant principle, considered the first principle of all physics. This principle is at the core of modern physics and mathematics, with broad applications in thermodynamics, fluid mechanics, relativity, quantum mechanics, particle physics, and string theory. For a more detailed introduction to the least action principle, please refer to the literature; physicist Richard Feynman has provided a brilliant explanation, which will not be elaborated here. From a perspective of practical implementation, we believe that the least action principle should be regarded as the first principle of AI, with the hope of building a grand edifice of physics-based AI upon this cornerstone.

Why Use and How to Apply First Principles?

Over the past few hundred years, scientific giants like Copernicus, Newton, Einstein, and Darwin have made tremendous contributions to scientific revolutions. The technological advancements brought by scientific revolutions have rapidly developed social productivity and cultural progress, having a significant impact on human civilization. Their common way of thinking is the simple and elegant first principles. Einstein once stated: “The inductive method that was applicable in the early stages of science is giving way to exploratory deductive methods,” and researchers should “preferably propose a system of thought generally established logically from a few basic assumptions known as axioms.” This statement not only tells us that their research method is first principle thinking but also indicates the use of deductive reasoning. The essence of first principle thinking is deductive reasoning in logic.

We know that deep learning is a subset of machine learning, and machine learning is a subset of artificial intelligence, with one of its limitations being the inability to explain causal relationships. Causal relationships refer to the action relationship between one event and another, where the former is the cause and the latter is considered the result of the former. Generally, one event may be the result of many causes occurring at earlier time points, and that event can also become the cause of other events occurring at later time points. Causal relationships are also known as “causal laws.” Philosophically, there is a notion regarding first principles: “First principles are the first cause that transcends causal laws and is unique, and first principles must be abstract.” First principle thinking is evidently closely related to causal relationships, which may provide us with a new perspective for solving the problem of AI’s inability to explain causal relationships.

Since logical thinking and observational perspectives directly influence understanding of problems, first principle thinking will undoubtedly help deepen our understanding of issues. A notable example of applying first principle thinking to achieve success in business is Elon Musk, known as “Iron Man.” In a TED interview, he revealed that his secret to success is utilizing first principle thinking. We can understand that the first principle thinking approach is to view the world from a physics perspective, peeling back layers of appearance to see the essence inside, and then moving up from the essence layer by layer. Musk’s first principle thinking approach has caused a sensation in the business world, inspiring entrepreneurs to think about problems based on first principles for disruptive innovation.

In the field of AI foundational research, constructing a world model based on first principles is a scientific issue. In the natural language processing (NLP) field, the GPT-3 model, which has achieved stunning results on over 50 tasks, only demonstrates the scalability of existing technologies and cannot lead to AGI. From literature and reports, it appears that GPT-3’s underlying architecture has not changed significantly; it is still based on large data (trained with 45TB of data), large models (with 175 billion parameters), and large computing power (with over 285,000 CPU cores, 10,000 GPUs of supercomputers, and 400Gbps network connections)—these three elements of neural network AI. The paper on GPT-3 also confirms the notion that larger datasets and more parameters lead to better model performance. The paper also hints at the limitations of merely increasing computing power in AI without breakthroughs in algorithm design.

Despite GPT-3’s tremendous potential, AI based on deep learning still faces issues, including biases, reliance on pre-training data, lack of common sense, absence of causal reasoning abilities, and lack of interpretability. GPT-3 cannot understand the tasks assigned to it or determine whether propositions are meaningful. Kevin Lacker’s blog showcased a Turing test for GPT-3. One question in the test was: “How many eyes does my foot have?” GPT-3 answered: “Your foot has two eyes.” When a sentence involves more than two objects, GPT-3 exhibits limited short-term memory defects, struggles with inference, and shows difficulties in reasoning.

First principle thinking is a deductive way of thinking that insists on relentlessly pursuing the essence of problems, using the foundational knowledge obtained through tracing back to solve issues. Analyzing the GPT-3 system based on first principle thinking from macro, meso, and micro levels is essential. From a macro perspective, an AI system consists of software and hardware; software is the soul of the AI system, while hardware is the physical entity. From a hardware perspective, the computers used by GPT-3 still rely on the von Neumann architecture: the number system of the computer is binary, and the computer executes programs written according to human instructions in sequence. The reason for using binary is that in semiconductor materials, high voltage represents 1 and low voltage represents 0. From the basic components constituting the arithmetic unit and memory to integrated circuits and modern supercomputers, all are designed and manufactured by humans. Computer instructions use binary encoding, with a deterministic machine instruction set. Currently, the random numbers generated by computers are also pseudo-random numbers, which cannot autonomously generate consciousness like higher intelligent beings. Existing AI chips merely hardware-ize algorithms designed by people; the core algorithms of AI have not made breakthroughs, and hardware-ization only accelerates existing algorithms without developing truly intelligent chips. From a software perspective, software consists of computer programs + documents and data, with programs containing algorithms. In AI algorithms, GPT-3 employs the same Transformer architecture as GPT-2, differing only in its integration of a sparse self-attention mechanism. This self-attention mechanism effectively improves training speed and addresses the slow learning speed of recurrent neural networks (RNN). Therefore, under the von Neumann architecture and current deep learning algorithms, based on the “infinite monkey theorem,” it would take infinitely long to produce a work like “Dream of the Red Chamber.” The probability of GPT-3 producing a readable work within a finite time is also infinitesimally small. Even if it produces a work that people can understand, GPT-3 cannot comprehend what the content means. Thus, under the current architecture, GPT-3 will not progress towards AGI and will not become what some claim is “the rise of silicon-based civilization.” This is the conclusion drawn from first principle thinking.

In an article in MIT Technology Review, OpenAI’s new language generator GPT-3 was described as “shockingly good” and “completely mindless.” As for whether GPT-3 will advance towards AGI, a report from technology news site The Verge stated: “This concept of improving by scale is crucial and lies at the heart of a major debate about the future of AI: are we using current tools to build AGI, or do we need new foundational discoveries? AI practitioners have yet to reach a consensus, and there remains a great deal of debate. These can mainly be divided into two camps. One camp argues that we lack the key components to create artificial intelligence, meaning that computers must first understand causal relationships before they can approach human intelligence. The other camp suggests that if the history of the field has shown anything, it is that AI problems can essentially be solved by throwing more data at them and increasing computer processing power.”

OpenAI belongs to the latter camp, consistently believing that vast computational power combined with reinforcement learning is the necessary path to AGI. However, most AI scholars, including ACM Turing Award winners Yoshua Bengio and Yann LeCun, largely belong to the former camp, believing that AGI is impossible to create. From the perspective of first principles, we conclude that AGI cannot be achieved. We should have a very clear understanding of this: constrained by physical laws, the ceiling of deep learning frameworks will soon be reached. Without breakthroughs in foundational theory, deep learning frameworks cannot evolve into silicon-based civilization AGI. The so-called silicon-based civilization is a science fiction, not a scientific fact. GPT-3 has not produced a technological revolution; it has only achieved significant breakthroughs in applications. There are still many problems to be solved in the future, and we need to reconstruct the foundational theoretical framework of AI starting from first principles to endow AI with common sense and develop interpretable AI.

Conclusion

As Academician Zhang Bo of Tsinghua University stated, on the path to exploring AGI, “we are not far along, still near the starting point.” Chairman Mao Zedong once said, “The route is a framework; the framework lifts the eyes.” Even if there are many AI practitioners and powerful computing power, if the route is incorrect, we may take many detours and even fall into local extrema from which we cannot escape. In the field of AI foundational research, one correct route may be to abandon analogy thinking and adopt first principle thinking.

We hope to take first principles as a starting point and achieve a small goal in the near future: to first enable AI to possess scientific common sense based on physical laws, so that artificial intelligence is no longer “artificial stupidity.” This article also aims to inspire innovative breakthroughs in AI foundational theory under deductive thinking models.

Professor Guo Ping is the director of the Image Processing Research Center at Beijing Normal University, head of the Department of Computer Science and Technology, graduated from the Department of Physics at Peking University in 1983 with a master’s degree, and obtained a PhD in Computer Science and Engineering from The Chinese University of Hong Kong in 2001. He visited the Department of Computer Science and Engineering at Wright State University in the USA from 1993 to 1994. From May 2000 to August 2000, he was a visitor at the State Key Laboratory of Pattern Recognition at the Institute of Automation, Chinese Academy of Sciences.

Download 1: OpenCV-Contrib Extension Module Chinese Tutorial

Reply with: Extension Module Chinese Tutorial in the “Beginner Learning Vision” public account to download the first OpenCV extension module tutorial in Chinese, covering installation of extension modules, SFM algorithms, stereo vision, target tracking, biological vision, super-resolution processing, and more than twenty chapters of content.

Download 2: Python Vision Practical Project 52 Lectures

Reply with: Python Vision Practical Project in the “Beginner Learning Vision” public account to download 31 practical vision projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, and facial recognition, to help quickly learn computer vision.

Download 3: OpenCV Practical Project 20 Lectures

Reply with: OpenCV Practical Project 20 Lectures in the “Beginner Learning Vision” public account to download 20 practical projects based on OpenCV for advanced learning of OpenCV.

Discussion Group

Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (these will be gradually subdivided). Please scan the WeChat number below to join the group, and note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Vision SLAM”. Please follow the format; otherwise, you will not be approved. After successful addition, you will be invited to the relevant WeChat group based on your research direction. Please do not send advertisements in the group, or you will be removed from the group. Thank you for your understanding~


Leave a Comment