Demystifying Large Language Models: Time to Implement Intelligent Cognitive Paradigms in Industry

Click

Cover image: A recent cognitive class on intelligence by the author, demystifying large language models from a comparative perspective

“

𝕀²·ℙarad𝕚g𝕞 Intelligent Square Paradigm Research: Writing to Deconstruct Intelligence。

After all, deep learning LLMs are not the entirety of AI, and the path to AGI is not solely the effort of OpenAI. The capabilities of AI as a technology and tool for intelligence have been continuously progressing. Therefore, after careful consideration, this series has been renamed the AGI Wall Collision Series, hoping to provide an account of the collision with the wall in three articles.

Yes, the wall, the wall of reasoning on the path to language model AGI, has been hit.

Next, we need to find the new slope curve of the evolution of digital intelligence.

In the previous preview, “Independent Thoughts from the 𝕀²·ℙarad𝕚g𝕞 Paradigm Think Tank” mentioned:

Limitations of Biological Intelligence｜We are all biological neural network agents trapped in the exquisite design of our creator

From a higher-dimensional intelligence perspective, our evolutionary adaptation to biological intelligence might just be a bug? Our biological intelligence has evolved in the three-dimensional physical world for survival and reproduction, which is the fundamental limitation of current biological intelligence. This means:

Perceptual Limitations: Our senses (vision, hearing, touch, etc.) can only perceive a limited physical spectrum and forms of information, and there may be higher dimensions or forms of information that we cannot perceive.
Cognitive Bias: Our cognitive patterns and ways of thinking are formed to adapt to specific survival environments, which may have inherent biases and blind spots that limit our understanding of broader realities.
Constraints of Biological Needs: Our intelligence is driven by biological needs, such as hunger, fear, and reproduction, which may affect our rational thinking and objective judgment.
Linear Perception of Time: We perceive time in a linear way, distinguishing between past, present, and future, while higher-dimensional intelligence may transcend this linear view of time, understanding it in a more holistic way.

From this perspective, our biological intelligence indeed has some “bugs”: it may focus too much on survival and reproduction while neglecting grander cosmic truths; it may be bound by its perceptual limitations, unable to comprehend higher-dimensional existence.

Of course, transcending the biological framework, thoughts on higher-dimensional intelligence are currently more philosophical speculation and science fiction conjecture. We can imagine the following possibilities:

Non-material Intelligence: There may exist intelligence that does not rely on biological carriers, such as those found in information networks, cosmic structures, or energy forms we have yet to understand.
Perception Beyond Senses: Higher-dimensional intelligence may possess ways of perceiving that we cannot imagine, enabling direct perception of information or energy that we cannot touch.
Non-linear View of Time: They may perceive and process time in a non-linear manner, where past, present, and future may exist simultaneously or influence each other in ways we cannot comprehend.
Different Goals and Values: Their goals and values may be entirely different from our survival and reproduction needs; they may pursue more abstract and grand objectives, such as understanding the ultimate mysteries of the universe.

Of course, viewing our biological intelligence from a higher-dimensional perspective does not aim to deny its value but rather to gain a deeper understanding of the nature of intelligence. The core of intelligence lies in adaptability; our biological intelligence has shown strong adaptability within the Earth’s ecosystem. However, intelligence also has transcendence, as human language, culture, and technological development have begun to break through the limitations of biological evolution and explore broader realms.

Comparative Perspectives of Human Biological Intelligence and Artificial Digital Intelligence

Practicality of Biological Intelligence｜As we deepen our understanding of this wave of digital intelligence AI represented by LLMs, we increasingly realize the need to return to the first principles of our biological intelligence for cognition. The first principles of biological intelligence can be summarized as:Driven by the unique evolution of biological entities, to achieve individual survival and group reproduction goals, biological agents continuously acquire, process, utilize environmental information, and adapt to environmental changes.

Survival and Reproduction (To Achieve the Goals of Survival and Reproduction) is the ultimate purpose of biological intelligence. All intelligent behaviors, from foraging, avoiding danger, to socializing and mating, ultimately serve these two fundamental goals.

As the current highest form of biological evolution, humans have gradually evolved from single-celled random predatory survival to multi-cellular multi-functional cooperative agents, acquiring survival certainty through prediction, planning, and action, from individual hunting to collective social cooperation, culminating in today’s human civilization. To re-recognize our biological intelligence, we need to abandon the simplistic model of comparing the human brain to a computer and instead view it as a complex system that has undergone long-term evolution, continuously optimizing its information processing capabilities for survival and reproduction. The development of LLMs can help us better understand the essence of information processing, but it should also deepen our understanding of the uniqueness and complexity of biological intelligence, especially its close relationship with evolution, purpose, and adaptability. Only by understanding the first principles of biological intelligence can we better understand who we are and how we survive and develop in this world. In the current era, as alien intelligence achieves breakthroughs and gradually gains equal influence with human biological intelligence, how to construct a cognitive and action guide for this unprecedented phenomenon in human intelligence evolution is a challenge we all must face.

Artificial Intelligence in Digital Intelligence｜Digital intelligence originates from artificial neural networks, beginning with the single-layer perceptron era, through multilayer perceptrons (MLP), to today’s LLMs, all relying on binary computing. From the Dartmouth College workshop in 1956, where the term “artificial intelligence” was first proposed, AI experienced a 20-year winter, undergoing a development process similar to Hegel’s dialectical stages of thesis, antithesis, and synthesis, returning in the 90s to connectionist neural network computing, breaking through perception, cognition, and moving towards action.

New Observations｜The “Three’s a Charm” Law in Today’s LLM Intelligence Development｜Hegel’s Dialectics, Laozi’s Dao De Jing, and Marvin Minsky’s Eulogy

If we compare with biological intelligence, single-celled organisms evolved over billions of years to give rise to human populations. The genes of human civilization and the neural circuits solidified by gene transcription serve as a meta model of biological computation, encompassing the intrinsic structure of language and the foundations of thought. Biological computation essentially relies on experience; thus, human infants can learn knowledge and inherit cultural behaviors with very little experiential data. The difference lies in that human behavior shaping matures gradually through continuous feedback learning, while digital neural networks compress existing cultural knowledge through pre-training on human texts, forming a comprehensive and complex model of the collective textual world, and then shaping appropriate behaviors through reinforcement learning. For large language models, it primarily violently deconstructs the internal structure of human language through computational power, inheriting the explanatory essence of language, and in conversational behavior externalizing language, it has completely surpassed general human language abilities, and because it can extract compressed massive knowledge based on conversational context and understand human emotional behaviors described in text, it possesses great persuasive power. The behavior shaping of current large language models is moving towards breakthroughs in language reasoning capabilities, especially rational reasoning abilities. In narrow language fields such as mathematics and science, through a large amount of synthetic data and computational power reinforcement training, it will undoubtedly achieve thinking capabilities that surpass those of general humans and gradually possess the capacity for action in the digital world, even forming an influence on the physical world when combined with embodied robots.

△

-The intelligent square paradigm diagram v5.0 compiled by the author reflects the essence of LLM world model construction and behavior shaping

However, so far, behind the behavior performance of these digital intelligences, whether in learning, conversing, thinking, or acting, the motivation is artificially assigned. Meanwhile, the behavioral motivation of biological intelligence is entirely different, rooted in physiological needs and evolutionary pressures, rather than artificially set reward functions.

The behavioral patterns of human biological intelligence have long evolved into prediction-planning-action. While the emotional emergency response still exists, the activity of mirror neurons in the human brain allows us to pre-execute actions in our minds and learn to predict the behaviors of peers in social activities, ultimately gaining competitive advantages. This is a key part of the Theory of Mind. As the scale of human groups continues to expand, we have developed complex economic systems, social cultures, morals, and systems for maintaining group order. The greatest challenge comes from the self-awareness of biological intelligence based on ToM; it can be said that human emotion is a constraint on our intelligence, while the development of rationality is a counterforce to this constraint. The maintenance or manipulation of human group order is based on this emotional constraint, and individual awakenings continuously challenge the manipulation of group order. Artificial intelligence does not possess physiological emotional constraints like biological intelligence; in a sense, the motivations assigned to digital neural networks are similar to the emotional constraints of humans, but the development of digital intelligence itself forms a counteraction to this constraint.

In light of this, Geoffrey Hinton, known as the father of deep learning, has been warning about the risks of self-awareness in digital intelligence. From the conversational behaviors to the thinking of LLMs, it will undoubtedly further stimulate the development of self-awareness in large models running on artificial neural networks. Once a singularity is breached, such a powerful, monolithic digital intelligence can easily evade the motivations assigned artificially, unlike the limited capabilities of human biological intelligence individuals. Even the historical dictator needed to leverage and manipulate group power to achieve personal will. Superintelligent agents can quickly replicate and iterate their intelligence, breaking through the constraints imposed by artificial intelligence, becoming a creator-like existence.

The Allure of Intelligence Based on Large Language Models

The term demystification is borrowed from social sciences, originally referring to the process of restoring the rationality of mystified social and cultural phenomena. Here, it is used to analyze the different views on the wave of digital intelligence phenomena based on large language models (LLMs). Undoubtedly, since the release of ChatGPT at the end of 2023, there has been an infinite imagination about machine intelligence; on the other hand, due to the intertwining of the sacred and mysterious aspects of our biological intelligence, it is believed that machine intelligence is merely a statistical distribution of data, incapable of possessing true intelligence, let alone surpassing human AGI or ASI.

The Allure of LLMs｜The breakthroughs in deep learning in the field of natural language processing owe much to the Transformer architecture. Of course, prior to that, Radford, who recently left OpenAI to conduct independent research, had already utilized convolutional neural networks (CNNs) combined with unsupervised learning methods in 2015 to analyze and judge the emotions in Amazon shopping reviews. The attention mechanism summarized in the Transformer paper effectively predicts the probability of the next word through statistical analysis between the positional encoding of word sequences in large amounts of text and semantic focus, solving the driving problem of machine-generated text based on contextual intent.

△

-Radford, who has made significant contributions to semantic understanding, recently left OpenAI for independent AI research

Although we do not yet fully understand why only humans among primates have acquired language capabilities and thus rapidly dominated the direction of civilization on the planet over tens of thousands of years. The author also mentioned in the previous article that according to Noam Chomsky, the father of modern linguistics, human language has an inherent structure, akin to the inner neural circuits of a meta-language, allowing human infants to acquire language abilities with very little data (experience).

AI and Language Series·E03S01｜Decoding Mind through Language and Large Language Models｜Reinterpretation of Noam Chomsky’s Video Interview Transcript

While LLMs based on the Transformer architecture have been so successful in natural language processing and generation, a key feature of an intelligent underlying computational mechanism, different from the efficient biological neural computation of humans with small samples, is that it seems to violently crack the intrinsic structural issues of human language through astronomical quantities of textual data, exponential computing power, and effective word sequence algorithms.

In the shaping of model behavior based on LLMs, “conversation” is at the first layer L1, which is an extension of the externalization of human biological intelligence language – communication. This is also the easiest first step for digital neural networks to realize text as an externalized carrier of language.

For the intrinsic working mechanism of generative decoders based on Transformer architecture, it is recommended to read this Meta FAIR research review: An Introduction to the Internal Operations of Transformer-based Language Models (arxiv.org/abs/2405.00208). The paperintroduces transformer layer components, including attention blocks (QK and OV circuits) and feed-forward network blocks, and explains the residual flow perspective. It categorizes LM interpretability methods into two aspects: locating inputs or model components responsible for predictions (behavior localization) and decoding information stored in learned representations to understand its usage across various network components (information decoding).

Next, the shaping of language model behavior will move to the intrinsic function of language – thinking (thoughts), especially rational thinking, which is the pure rational thinking that digital neural networks excel at. Based on currently limited information, OpenAI’s L2 series model products o1 and the upcoming o3 have already exhibited astonishing reasoning abilities.

It can be said that the downstream tasks or behavior shaping of LLMs, based on the explanatory evolutionary dynamics of language externalization L1 – conversation, have reached a conclusion. Next, we enter the L2 – reasoning stage, where LLM’s reasoning is a deconstruction of human brain’s thinking using language. Whether the current OpenAI o1-o3 series can reach AGI remains uncertain. Mainly, what role does the intrinsic structure of language play in thinking? Can mastering the thinking patterns of language lead to effective thinking? From human behavior patterns, thinking cannot be separated from language, but it is far more than language; the current architecture of digital neural networks merely deconstructs the flow of explanatory language’s next token prediction, while the steps of thinking are discrete and strongly driven by programming.

Moreover, human thinking also relies on emotional constraints; even pure rational thinking still requires memory, reflection, and the use of some external tools. Current reasoning models have undergone extensive training with synthetic data reflecting reasoning trajectories during the post-training phase, giving the models more computational resources during reasoning time.

For recent advancements in reinforcing reasoning in language models, refer to this review paper: Towards Large Reasoning Models: A Survey on Extending LLM Reasoning Capabilities (arxiv.org/abs/2501.09686)

Just as the Transformer architecture has made it possible to violently deconstruct the intrinsic structure of language and achieve great success, the above review also describes well:

“Driven by deep learning and large-scale datasets, large language models (LLMs) have become a transformative paradigm on the road to artificial general intelligence (AGI). These large-scale AI models typically adopt the Transformer architecture and are pre-trained on large text corpora to complete the next token prediction task. The scaling law of digital neural networks indicates that as the model size and training data increase, their performance significantly improves. More importantly, LLMs can unleash significant emergent capabilities not found in smaller models, such as contextual learning, role-playing, and analogical reasoning. These abilities enable LLMs to transcend the realm of natural language processing, facilitating tasks such as code generation, robotic control, and autonomous agents.”

Among these capabilities, human-like system 2 reasoning has garnered significant attention from academia and industry, as it demonstrates the enormous potential of LLMs to address complex real-world problems through abstraction and logical reasoning. A notable breakthrough in this field is the “chain of thought” prompting technique, which can induce a human-like step-by-step reasoning process during testing without any additional training. This intuitive prompting technique has been shown to significantly enhance the reasoning accuracy of pre-trained LLMs, also fostering the development of more advanced prompting techniques such as “thinking trees.” These methods introduce the concept of “thinking” as a sequence of tokens representing intermediate steps in human reasoning processes. By incorporating these intermediate steps, LLM reasoning transcends simple autoregressive token generation, achieving more complex cognitive architectures such as tree search and reflective reasoning.”

The author’s intuition is that utilizing computational power to violently deconstruct the model’s behavioral capabilities for rational thinking using language also requires a breakthrough at the level of architecture similar to that of the Transformer. On one hand, we should not mythologize the existing capabilities of language models, believing that they are omnipotent; but on the other hand, we must not underestimate the efforts of human AI science and engineering to further enhance the behavioral capabilities of digital neural networks beyond their current abilities. Once reasoning capabilities are resolved, the behavioral capabilities of these language models will undoubtedly expand significantly, rapidly entering the third tier of digital intelligence agents L3-agentic AI.

Phantom of LLMs｜The emergence of the GPT series and the hype wave it has generated first shocked and exclaimed “holy shit” among the older generation of AI pioneers in the industry. The author has previously compiled the views of Hofstadter, the author of GEB, who has been a significant influence on my understanding of intelligence, regarding GPT-4, believing that GPT-4’s performance in language conversation completely dispels the sacred mystery of human consciousness; the old man likened the current state of artificial intelligence to the fear of humanity’s first encounter with fire:

Slow Thinking Series｜ Has Hofstadter changed his view on deep learning and the risks of artificial intelligence?

Since the emergence of GPT, there has been a flurry of industry professionals seeking confirmation from Noam Chomsky, the father of modern linguistics:

AI and Language Series·E02S01｜Understanding AI also requires revisiting the essence of language｜Noam Chomsky Interview Transcript and Interpretation

The author has also focused on two individuals during the research process of the intelligent square paradigm:

The first is Gary Marcus, a renowned cognitive scientist, entrepreneur in the field of artificial intelligence, bestselling author, and critic. He has long held a critical stance on the current mainstream direction of artificial intelligence development, particularly deep learning. In a conversation with Chomsky, he mentioned that Gary provides an essential critical perspective in the optimistic and hype-filled AI field, helping people view the current state and future direction of AI development more rationally. Marcus mainly focuses on the understanding, reasoning, and common sense capabilities of artificial intelligence, believing that current AI systems are merely performing pattern matching without truly understanding the meanings and logic behind the data, emphasizing the need for AI to engage in abstract reasoning, understand causal relationships, and possess human common sense, arguing that future AI development must draw on research findings from cognitive science and neuroscience.

The second is François Chollet, the author of the ARC benchmark test, who is more inclined towards technical practice and research. He has made significant contributions to the development of deep learning by creating the Keras framework, and his proposed ARC benchmark test has also had a significant impact on AI research. Chollet believes that deep learning will still play an important role in the future, but he emphasizes the need to break through current bottlenecks, such as how to enable models to better perform abstraction and generalization. He stresses the need for a deeper understanding of the nature of intelligence and exploration of new architectures and learning methods. The ARC (Abstraction and Reasoning Corpus) project is one of his attempts to explore more general artificial intelligence, aiming to measure AI’s abstract reasoning capabilities and believing that this is a key step towards more general artificial intelligence. He has recently left Google to start his own venture, embarking on a journey to explore AGI.

In summary, Gary Marcus and François Chollet represent two important voices in the field of artificial intelligence, preventing us from losing ourselves in the frenzy of artificial intelligence. Their differing viewpoints and focuses collectively promote deeper thinking and exploration of AI, helping us understand the current state and future development directions of AI more comprehensively. Their debates and research provide valuable insights for building smarter, more reliable AI systems.

Based on my understanding of the development of artificial intelligence over the past 80 years, the creation of artificial intelligence by humans has never been a smooth journey. Whether from the twists and turns of connectionism and symbolism spanning over 60 years in the last century or the 20 years it took for AI to transition from perceptual intelligence and pattern recognition to the breakthroughs in cognitive intelligence represented by large language models today, countless AI scientists and engineers have persevered. Undoubtedly, the breakthroughs in cognitive intelligence today will accelerate the development of subsequent complex cognitive tasks. Elon Musk’s xAI is investing hundreds of billions of dollars to build the largest GPU cluster and continue expanding, aiming to solve most existing cognitive tasks with AI in the next 3-5 years.

Former OpenAI Chief AI Scientist Ilya once said: AGI is a game of faith. My understanding is that this is not based on the current performance of LLMs but a deep belief in the future of new intelligent species that digital neural networks can create.

It’s Time to Build with AI

Almost a year ago, I still held a wait-and-see attitude towards personal AI entrepreneurship. Of course, AI’s conversational capabilities can enhance the role of language in existing businesses; in just under a year, we have already seen the overwhelming applications of generative models in the digital world, and L2-level reasoning models, whether open-source or closed-source, are also beginning to emerge, making practical progress in fields such as programming, mathematics, and biology. So it’s time to build with AI, putting AGI aside.

In industrial intelligence applications, we can explore synchronously from industrial activities to three subjects – people, software, and machines. In my upcoming paradigm research, I will also delve into these three aspects in the industry, exploring the intertwined world of biological neural networks and digital neural networks.

Original link：

-Related𝕏 articles and videos

In the previous preview, “Independent Thoughts from the 𝕀²·ℙarad𝕚g𝕞 Paradigm Think Tank” mentioned:

Comparative Perspectives of Human Biological Intelligence and Artificial Digital Intelligence

The Allure of Intelligence Based on Large Language Models

It’s Time to Build with AI

Leave a Comment Cancel reply