In August 2023, the National Internet Information Office, in conjunction with the National Development and Reform Commission, the Ministry of Education, the Ministry of Science and Technology, and the Ministry of Industry and Information Technology, issued the “Interim Measures for the Management of Generative Artificial Intelligence Services,” which proposed ensuring the legality and ethics of AI agents in generating content and executing tasks. AI agents are intelligent systems composed of planning, memory, tools, and action modules, centered around AI, and collaboratively operate across cloud, network, edge, and terminal environments, possessing perception, judgment, response, and learning capabilities. Under the global optimization effect of large models, the autonomy and efficiency of AI agents can help enterprises enhance productivity, and their interactive collaboration can combat organizational entropy, achieving cost reduction, efficiency enhancement, and quality improvement.
Observation 1: Dual Value of Research and Application Sparks AI Agent Wave
AI agents possess both scientific research value and industrial application value, enabling precise environmental perception, scientific decision-making, and effective action based on core AI technologies like large models. Internationally, AutoGPT has led a surge in AI agent research, with countless application cases emerging based on AutoGPT. Meta collaborated with CMU to successfully develop the general-purpose robotic AI agent RoboAgent over two years, which serves as both a multi-directional research collective and an inspiration for future work. Domestically, Tsinghua University and Alibaba Cloud have jointly conducted research on evolvable AI agents based on large models, marking the first systematic research on the integration of academia and industry regarding evolvable AI agents, representing a significant milestone in the continuous evolution of AI model capabilities. Furthermore, universities at home and abroad are innovating collaboratively to promote the development of AI agents, such as researchers from Tsinghua University, Beijing University of Posts and Telecommunications, and Brown University proposing the ChatDev framework for automated software development based on large models, facilitating effective communication and collaboration among different roles of AI agents.
Observation 2: Brain-Hand Dual-Drive Intelligent System Empowers AI Agent Implementation
The AI agent system based on large models can be divided into four parts: planning, memory, tools, and action, analogous to the multiple functions of the human “brain” and “hand” that collectively construct an intelligent system. At the “brain” level, AI agents can utilize large models to achieve semantic understanding, logical reasoning, and task decomposition planning. The robot Transformer RT-2 is a vision-language-action AI agent model that outputs and influences robot behavior through data training and transmission, introducing prompt chains to conduct multi-stage semantic and visual reasoning. At the “hand” level, using tools is a prerequisite for action, enabling AI agents to extend the model’s capabilities. Zhejiang University and Microsoft jointly released HuggingGPT, which integrates hundreds of models from HuggingFace, capable of solving 24 tasks through four steps: task planning, model selection, task execution, and answer production.
Observation 3: Four Key Features of AI Agents Showcase Multi-Dimensional Reasoning Dialogue Capabilities
The attention on AI agents arises from their autonomy, reactivity, proactivity, and sociality, with these four characteristics interwoven to construct a holistic entity exhibiting highly intelligent behavior. In terms of autonomy, AI agents operate without direct intervention and have a certain degree of control over their actions and internal states. For instance, BabyAGI is a streamlined AI agent framework capable of self-driven task execution, autonomously creating, organizing, and executing tasks based on predetermined goals, and introducing the ability to prioritize tasks for the first time into its framework. In terms of reactivity, AI agents can respond rapidly to immediate changes and stimuli in the environment. For example, the AI agent JARVIS utilizes ChatGPT’s reasoning capabilities to apply the best model to a given task, demonstrating considerable flexibility. In terms of proactivity, AI agents not only react to the environment but also possess the ability to take initiative to demonstrate goal-oriented behavior. For instance, VOYAGER is a large model-driven game AI agent that continuously explores the world, autonomously seeking new tasks. In terms of sociality, an AI agent’s ability to interact with other agents and humans through intelligent communication languages is crucial. For example, Camel is a multi-agent framework that uses unique role-playing to enable communication and collaboration among multiple AI agents.
Observation 4: Diverse Applications of Single AI Agents and Collaborative Innovation of Multi-Agent Interactions
The technology of AI agents has innovated and upgraded in aspects of single agents, multi-agents, and human-agent interactions, aiming to build a new intelligent operational ecosystem centered on “human-machine collaboration.” In terms of single agents, deployment oriented towards tasks, innovation, and lifecycle has demonstrated diverse applications. Minecraft, as a typical simulated survival environment, can be used by researchers to investigate the potential of individual agents surviving in the real world. In terms of multi-agents, the complementarity of cooperation and competition drives technological and intelligence advancement. ChatEval is a multi-agent referee team that evaluates the quality of text generated by large models through self-initiated debates, reaching a level comparable to human evaluators. In terms of human-agent interactions, it encompasses mentor-executor and equal partnership relationships, jointly promoting the comprehensive development of AI agents in various fields. The Inner Monologue model promotes AI agents to achieve wiser planning and reasoning by collecting feedback from the environment and humans.
Observation 5: Challenges and Opportunities Coexist, AI Agents Will Welcome a New Wave of Development
While AI agents based on large models have achieved remarkable results, they face challenges on both technical and application levels. On the technical level, the main challenges include data processing, algorithm models, and collaborative interaction. In the future, AI agents must explore the balance between data security and personalization, improve the accuracy of large models, enhance generalization, and deepen the interactive relationships of AI agents, forming an AI agent network that enriches overall service capabilities while broadening the scope of service applications. On the application level, the issue of hallucination poses a fundamental challenge; agents with overly narrow knowledge boundaries are more likely to fabricate and distort facts. In the future, multiple agents with specialized knowledge and unique skills should be deployed to play different roles in society. Additionally, enhancing the flexibility of AI agents will enable them to keep pace with the times, updating their knowledge bases in real-time.
Contact Person
Teacher Zhang 15722924458
Teacher Zhang 19852822678
END
Contact Person of AIIA Alliance Secretariat

Teacher Gu

Teacher Huang