Advancing Towards Full Autonomy: New Inter-Task Self-Evolution Strategy

Source: Machine Heart

This article is about 2000 words, suggested reading time is 5 minutes.
Recently, a joint team from Tsinghua University, the University of Hong Kong, Renmin University, and Wallface Intelligence proposed a new self-evolution strategy for intelligent agents: Investigate - Consolidate - Exploit (ICE).

“Learning from history can help us understand the rise and fall of civilizations.” The history of human progress can be seen as a continuous process of self-evolution that absorbs past experiences and pushes the boundaries of capabilities. In this process, we learn from past failures to correct mistakes and draw on successful experiences to enhance efficiency and effectiveness. This self-evolution process is ubiquitous in our lives: from how to summarize experiences to better solve problems at work, to how to use patterns to more accurately predict the weather, we are constantly learning and evolving from past experiences.

Successfully extracting knowledge from past experiences and applying it to future challenges is an important milestone in the path of human evolution. So, in the era of artificial intelligence, can AI agents do the same?

In recent years, language models such as GPT and LLaMA have demonstrated their remarkable abilities in solving complex tasks. However, although they can use tools to solve specific tasks, they fundamentally lack insights and learnings from past successes and failures. It’s like a robot that can only complete specific tasks; while it performs excellently in the current task, it cannot draw on past experiences to provide assistance when faced with new challenges.

To address this issue, a recent joint team from Tsinghua University, the University of Hong Kong, Renmin University, and Wallface Intelligence proposed a new self-evolution strategy for intelligent agents: Investigate – Consolidate – Exploit (ICE). It aims to enhance the adaptability and flexibility of AI agents through cross-task self-evolution. It not only improves the efficiency and effectiveness of agents when handling new tasks but also significantly reduces the demand for the capabilities of the agent’s foundational model.

The emergence of this strategy undoubtedly opens a new chapter for the self-evolution of agents, also indicating that we are one step closer to achieving full autonomy for agents.

Paper Title: Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution
Paper Link: https://arxiv.org/abs/2401.13996

Overview of agent inter-task experience transfer for self-evolution

Two Aspects of Agent Self-Evolution: Planning and Execution

Most complex intelligent agents today can be divided into two main aspects: task planning (Planning) and task execution (Execution). In task planning, agents refine user requirements through reasoning and develop detailed strategies to achieve goals; in task execution, agents interact with the environment through tool invocation to accomplish corresponding sub-goals.

To better facilitate the reuse of past experiences, the authors first decoupled the evolution strategies of these two aspects. They illustrated the specific implementation of the ICE strategy using the tree-structured task planning architecture in the XAgent agent framework and the ReACT chain tool execution as examples.

ICE self-evolution strategy for agent task planning

For task planning, self-evolution according to ICE is divided into the following three stages:

In the exploration stage, the agent records the entire tree-structured task planning and simultaneously dynamically detects the execution status of each sub-goal;
In the consolidation stage, the agent first removes all failed target nodes, and then for each successfully completed goal, the agent arranges all leaf nodes as a sub-tree of that goal to form a planning chain (Workflow);
In the exploitation stage, these planning chains will be used as a reference basis for refining new task goals to utilize these past successful experiences.

ICE self-evolution strategy for agent task execution

The self-evolution strategy for task execution is also divided into three ICE stages, where:

In the exploration stage, the agent dynamically records the tool invocation chain for each task and performs simple detection and categorization of potential issues that arise during tool invocation;
In the consolidation stage, the tool invocation chain will be transformed into a pipeline structure similar to an automaton, fixing the order of tool invocation and the transition relationships between invocations while also removing duplicate calls and adding branching logic, making the automaton’s execution process more robust;
In the exploitation stage, for similar goals, the agent will directly automate the execution pipeline, thereby improving task completion efficiency.

Self-Evolution Experiments Under the XAgent Framework

The authors tested the proposed ICE self-evolution strategy within the XAgent framework and summarized the following four findings:

The ICE strategy can significantly reduce the number of model invocations, thereby enhancing efficiency and reducing costs.
The stored experiences have a high reuse rate under the ICE strategy, proving the effectiveness of ICE.
The ICE strategy can improve the completion rate of sub-tasks while reducing the number of planning revisions.
With the support of past experiences, the requirements for model capabilities in task execution have significantly decreased. Specifically, using GPT-3.5 along with previous task planning and execution experiences can yield results comparable to GPT-4.

After experience storage through exploration and consolidation, performance of test set tasks under different agent ICE strategies

Additionally, the authors conducted further ablation experiments: as the stored experiences gradually increased, did the agent’s performance improve? The answer is yes. From zero experience, half experience, to full experience, the number of invocations of the foundational model gradually decreased, while the completion rate of sub-tasks increased, and the reuse rate also rose. This indicates that more past experiences can better facilitate agent execution, achieving scale effects.

Statistical results of ablation experiments on test set task performance under different experience storage volumes

Conclusion

Imagine a world where everyone can deploy intelligent agents; the number of successful experiences will accumulate as intelligent agents execute individual tasks, and users can share these experiences in the cloud and community. These experiences will encourage intelligent agents to continuously absorb abilities, self-evolve, and gradually achieve full autonomy. We are one step closer to such an era.

Editor: Wenjing

Leave a Comment Cancel reply