Prospects For Large Language Models In The Era Of AGI

Author: Wang Yurun

This article is about 5000 words long and suggests a reading time of 10 minutes.
The goal of AGI has always been the ultimate vision of artificial intelligence research.

1. What is AGI

AGI (Artificial General Intelligence) refers to an artificial intelligence system that can exhibit a wide and flexible range of intelligent capabilities similar to humans across various tasks. Unlike current narrow AI, AGI is not just optimized for a specific task but possesses the ability to adapt and solve problems in various environments and scenarios. AGI should be capable of autonomous learning, reasoning, planning, and decision-making, with common sense and long-term memory abilities similar to humans, enabling it to tackle tasks that require cross-domain knowledge and complex reasoning.

Narrow AI: Refers to AI systems that excel in specific tasks, such as image recognition and speech recognition. Most current AI systems fall into this category.
Broad AI: AI systems at this stage can exhibit strong intelligent capabilities across multiple domains but still rely on humans to provide explicit goals and training data.
General AI (AGI): Refers to the true cognitive abilities comparable to humans, capable of independent learning and adaptation in any task.

The concept of AGI was first proposed by AI pioneers such as Alan Turing and John McCarthy, who envisioned an artificial intelligence system with broad cognitive and adaptive capabilities similar to humans. In 1950, Turing introduced the famous “Turing Test” in his renowned paper “Computing Machinery and Intelligence” to assess whether machines could exhibit human-level intelligence. This idea laid the foundation for AGI research.

The Google DeepMind research team categorizes AGI levels based on AI model performance and the breadth of learning tasks, ranging from Level 0 (no AI) to Level 5 (superhuman), divided into six levels.

Narrow AI

(Clearly defined tasks or task sets)

General AI

(A broad range of non-physical tasks, including meta-cognitive tasks like learning new skills)

Level 0: No AI

Narrow Non-AI

Calculator software; compiler

General Non-AI

Human intervention in computation, e.g., Amazon Mechanical Turk

Level 1: Emerging

(Equal to or slightly better than unskilled humans)

Narrow Emerging AI

GOFAI; simple rule-based systems, e.g., SHRDLU

Emerging AGI

ChatGPT, Bard, Llama 2, Gemini

Level 2: Competent

(At least 50% of a competent human)

Competent Narrow AI

Toxicity detectors like Jigsaw; smart speakers like Siri (Apple), Alexa (Amazon), or Google Assistant (Google); visual question-answering systems like PaLI; Watson (IBM); latest LLMs for specific tasks (e.g., short writing, simple coding)

Competent AGI

Not yet achieved

Level 3: Expert

(At least 90% of a competent human)

Expert Narrow AI

Spelling and grammar checkers like Grammarly; generative image models like Imagen or Dall-E 2

Expert AGI

Not yet achieved

Level 4: Virtuoso

(At least 99% of a competent human)

Virtuoso Narrow AI

Deep Blue; AlphaGo

Virtuoso AGI

Not yet achieved

Level 5: Superhuman

(Exceeding 100% of human ability)

Superhuman Narrow AI

AlphaFold, AlphaZero, StockFish

Superhuman AGI

Not yet achieved

The DeepMind research team emphasizes that these levels are based on AGI’s performance and generality, proposing that any definition of AGI should meet six principles: focusing on capability rather than process, focusing on generality and performance, focusing on cognitive and meta-cognitive tasks, focusing on potential rather than deployment, focusing on ecological validity, and focusing on the pathway to AGI rather than a single endpoint.

1. Focus on capability rather than process. AGI evaluation should be based on its output and effectiveness, rather than its internal workings or mechanisms. For example, if an AGI system can pass exams or solve complex problems, we care about its ability to complete these tasks rather than whether it thinks like a human.

2. Focus on generality and performance. For a system to be considered AGI, it must not only excel in multiple domains (generality) but also reach a certain performance level in these domains.

3. Focus on cognitive and meta-cognitive tasks. Cognitive tasks involve knowledge processing tasks, such as understanding, learning, and memory. Meta-cognitive tasks involve awareness and control of one’s cognitive processes, such as learning new skills or seeking help when facing difficulties. This principle emphasizes that AGI must not only perform specific cognitive tasks but also engage in self-reflection and self-improvement.

4. Focus on potential rather than deployment. When defining AGI, attention should be paid to the system’s potential capabilities rather than whether it has been practically deployed or applied. This means that even if a system demonstrates superhuman capabilities in a laboratory setting, it can still be considered AGI even if it has not been widely deployed.

5. Focus on ecological validity. Ecological validity refers to the degree of similarity between the environment of research or testing and real-world environments; AGI evaluation should be based on its performance in real-world tasks.

6. Focus on the pathway to AGI rather than a single endpoint. This principle recognizes that the development of AGI is a gradual process rather than a single, fixed endpoint. This means we should focus on the different stages and milestones in the development of AGI rather than just the final, fully realized AGI state.

The goal of AGI has always been the ultimate vision of artificial intelligence research. Currently, large models still have significant room for improvement in task handling breadth, and their capabilities are still at the Emerging AGI level. While models like GPT-4, Gemini 1.5, and Claude 3 can already handle multimodal inputs such as text, images, and videos, and perform various tasks such as solving mathematical problems, creating content, writing poetry, and answering questions in an informative manner, they still lack the ability for independent decision-making and action execution. Moreover, many current models still focus on performance improvements in a single domain; for example, Kimi excels in handling long text inputs but cannot generate images; Sora can perform high-quality text-to-video tasks but lacks Q&A functionality.

The maturity of various large models is as follows: language large models > multimodal large models > embodied intelligent large models. Language large models have relatively complete capabilities and can complete basic tasks in reasoning, long text, and code generation, but still have gaps in complexity and professional levels; multimodal large models have significant room for detail optimization, and high-quality, large-scale datasets are still in the early stages of development; embodied intelligent large models are still in the exploratory stage, with unclear underlying technical routes, and data collection, training methods, and evaluation methods are all in the early stages of development, resulting in low accuracy in practical application scenarios.

To achieve the goal of AGI, large models still face many challenges, the main challenges include understanding natural intelligence, developing fully autonomous models that can adapt, and ensuring safety and reliability in understanding the physical world. First, AGI needs to possess generality that transcends specific domains, whereas current large language models, although performing excellently in specific tasks, still have a passive training process and lack proactive cognition and self-reflection capabilities. Second, AGI needs to have continuous learning abilities, capable of adapting to new environments and tasks without relying on large amounts of annotated data. However, existing large language models mainly rely on one-time offline training rather than dynamic, online learning methods. Additionally, AGI must possess strong reasoning and planning capabilities, able to solve complex problems through logical reasoning and causal analysis, which are currently weak points of large language models.

2. The Path to AGI Based on LLM

(1) Scaling Laws and Model Expansion

In the process of developing AGI based on large language models (LLM), scaling laws play a crucial role, especially in improving model performance, resource allocation, training strategies, and understanding the boundaries of model capabilities. The core idea of scaling laws is that as the model size (including the number of parameters, amount of training data, and computational load) increases, the model’s performance also improves. In deep learning, researchers have found that as model parameters (such as the number of layers and nodes in a neural network) and the amount of training data gradually increase, the model’s performance typically improves in a predictable manner. This phenomenon was systematically proposed and validated by OpenAI in its research on the GPT series of models.

The OpenAI team published a paper in 2020 titled “Scaling Laws for Neural Language Models,” summarizing the scaling laws between model parameters, training dataset size, computational input (FLOPs per second), and network architecture. Specifically, the scaling law indicates that model error (E) has a power-law relationship with model size (N), data amount (D), and computational load (C), providing a theoretical basis for effective scaling of LLMs.

Predictability of Model Performance. Scaling laws provide a predictable framework, allowing researchers to predict model performance growth by studying the relationship between model size (number of parameters), training data scale, and computational budget. Theoretically, as the number of parameters, training data, and computational resources increase, the model’s capabilities will show a predictable improvement pattern. This enables researchers to predict the performance of larger-scale models based on existing resources and empirical data.

Optimizing Resource Allocation and Cost Efficiency. Scaling laws help researchers evaluate the trade-offs between the number of parameters, amount of training data, and computational costs. In AGI research, resource costs (such as computing resources and storage requirements) are enormous, and scaling laws provide guidance on how to balance these factors. For example, if increasing the number of model parameters does not significantly improve performance, then one should not blindly expand parameters but rather focus on optimizing data quality and training methods. This regularity also reveals computational efficiency issues at different scales, helping developers determine at which stage increasing computational resources can yield the most significant performance improvements, further advancing the practice of LLMs in AGI research.

Guiding Model Design and Architecture Selection. Scaling laws provide a theoretical basis for model architecture design. In the development of AGI, researchers can design models that better conform to scaling laws by observing the scaling patterns of different architectures, thus improving performance and computational efficiency. This has also led to the widespread application of the Transformer architecture, as it has shown a performance improvement pattern consistent with scaling laws, making it the mainstream choice for training LLMs. Additionally, scaling laws can reveal performance differences of models across different tasks, helping developers determine the optimal model size and training data amount required for specific tasks. This provides reliable theoretical support for exploring the applicability and generalization capabilities of AGI in different fields.

Some industry scholars believe that scaling laws are the first principles on the path to AGI, and that increasing model size is essentially compression, which can give rise to intelligence. This process faces many challenges, including limitations in model efficiency and computational costs, model generalization and cognitive abilities, data quality limitations, causal reasoning and interpretability, continuous learning, and self-optimization. Among these, the most significant challenge may be the lack of data; many fields do not have rich data. If we want to build an AI system that surpasses human capabilities, there may not even exist such data in reality. At the same time, under the current technological framework, training larger-scale models requires enormous computational resources, and the guidance of scaling laws cannot be expanded indefinitely, which limits the applicability of scaling laws.

(2) Self-Play: A New Paradigm for LLM

Self-Play is a learning method where models learn through self-competition, which has achieved significant success in the field of reinforcement learning, such as in the training process of AlphaGo. Applying the concept of Self-Play to LLM allows models to enhance their capabilities through their interactions and competition without external supervision. This autonomous learning approach can enhance the model’s exploration capabilities and adaptability, enabling better self-optimization in complex environments. By continually engaging in self-play, models can accumulate experience across different tasks and scenarios, thereby improving their problem-solving abilities and gradually moving toward the AGI goal.

In the evolution of LLM, Self-Play is regarded as a new paradigm that can accelerate the learning and optimization of agents. As the marginal returns of scaling up LLM begin to diminish, enhancing LLM reasoning capabilities through RL self-play and MCTS becomes the next technical paradigm. In this new paradigm, the scaling laws in the LLM field will change: while the increase in computational load will still lead to improvements in model intelligence, the focus will shift from increasing model parameters to enabling models to engage in more reinforcement learning exploration. Through self-play, agents can quickly identify deficiencies in their strategies and make adjustments. This rapid iteration process enables agents to adapt more quickly to complex environments, thereby enhancing their logical reasoning capabilities.

Moreover, the advantage of Self-Play technology is that it does not rely on external datasets or labels; agents train on data generated from self-play, reducing dependence on external resources. This method can automatically generate reward signals, simplifying the complex processes that require external reward signals in traditional reinforcement learning.

The latest advancements in Self-Play also include applications in multi-agent environments, where multiple agents compete against each other, further enhancing strategy diversity and complexity. This multi-agent self-play approach not only improves the model’s collaborative capabilities but also enhances the model’s ability to cope with uncertainty and variable environments. Additionally, the multimodal expansion combined with Self-Play is also being explored, such as applying Self-Play in visual-language tasks, enabling models to continuously improve their perception and reasoning capabilities through multimodal interactions, providing new possibilities for achieving more comprehensive AGI.

Editor: Huang Jiyan

Author Profile

Wang Yurun, a PhD candidate in Land Space Planning at Peking University, focuses on modeling human mobility and urban complexity, particularly how to analyze urban dynamics and social behavior patterns through large-scale graph learning and causal inference techniques.

DataPi Research Department Introduction

The DataPi Research Department was established in early 2017, with interest as the core dividing into multiple groups, each adhering to the overall knowledge sharing and practical project planning of the research department while having their unique features:

Algorithm Model Group: Actively participates in competitions like Kaggle and produces original hands-on teaching articles;

Research and Analysis Group: Explores the beauty of data products through interviews and other methods;

System Platform Group: Tracks cutting-edge technologies in big data & AI systems and engages with experts;

Natural Language Processing Group: Focuses on practice, actively participates in competitions, and plans various text analysis projects;

Manufacturing Industry Big Data Group: Driven by the dream of becoming a strong industrial nation, combines industry, academia, and government to extract data value;

Data Visualization Group: Merges information with art, explores the beauty of data, and learns to tell stories through visualization;

Web Crawling Group: Crawls web information and collaborates with other groups to develop creative projects.

Click on the end of the article “Read Original” to sign up for DataPi Research Department volunteers; there is always a suitable group for you~

Reprint Notice

If reprinting, please prominently indicate the author and source at the beginning (source: DataPi THU ID: DatapiTHU), and place a prominent QR code for DataPi at the end of the article. For articles with original identification, please send [Article Name – Pending Authorization Public Account Name and ID] to the contact email to apply for whitelist authorization and edit as required.

Unauthorized reprints and adaptations will be legally pursued.

About Us

As a data science public account backed by the Tsinghua University Big Data Research Center, DataPi THU shares cutting-edge data science and big data technology innovation research dynamics, continuously disseminating data science knowledge, striving to build a platform for gathering data talent, and creating the strongest group in China’s big data field.

Sina Weibo: @DataPi THU

WeChat Video Account: DataPi THU

Today’s Headlines: DataPi THU

Click “Read Original” to embrace the organization

Leave a Comment Cancel reply