MetaGPT Open Source Automates Intelligent Agent Workflows

The AIxiv column is a section published by Machine Heart that reports academic and technical content. Over the past few years, the AIxiv column has reported on more than 2000 pieces of content, covering top laboratories from major universities and companies worldwide, effectively promoting academic exchange and dissemination. If you have excellent work to share, you are welcome to submit or contact us for reporting.Submission email: [email protected]; [email protected]

The AFLOW author team comes from the MetaGPT open-source community. The co-first authors of the AFLOW paper are doctoral student Zhang Jiayi from the Hong Kong University of Science and Technology (Guangzhou) and researcher Xiang Jinyu from DeepWisdom. The corresponding authors are Wu Chenglin, the founder and CEO of DeepWisdom (MetaGPT code author, paper corresponding author), and Assistant Professor Luo Yuyu from the Hong Kong University of Science and Technology (Guangzhou). The authors also include Yu Zhaoyang, Teng Fengwei, and Cheng Xin from Renmin University of China, doctoral student Chen Xionghui from Nanjing University’s LAMDA laboratory, Chen Jiaqian and Zheng Bingnan from Fudan University, doctoral student Zhuge Mingchen from King Abdulaziz University (co-first author of the MetaGPT paper), DeepWisdom researcher Hong Sirui (co-first author of the MetaGPT paper), and Wang Jinlin, Assistant Professor Liu Bang from the University of Montreal and MILA laboratory.

For practitioners of LLM, applying LLMs and making them work requires manually constructing and repeatedly debugging the Agentic Workflow, which is undoubtedly a tedious process, modifying similar code repeatedly, debugging prompts, manually executing tests, and observing effects, and switching to another LLM might cause it to fail, leading to high labor costs. Many companies even hire Prompt Engineers exclusively to complete this work.

Now, the Agentic Workflow also has its own automatic optimization tool.

MetaGPT has open-sourced AFLOW, which uses MCTS for automatic search of Agentic Workflow, allowing for the completely automated construction and optimization of Agentic Workflow problems, eliminating the need to handwrite code and debug prompts.

AFLOW optimizes workflows through Monte Carlo Tree Search, achieving GPT-4o-level capabilities at a very low cost

This is a further exploration of automatic prompt optimization, completely taking over the generation and optimization process of the Agentic Workflow, outperforming other automated workflow optimization works, even surpassing all manual workflow baselines compared.

Paper Title: AFlow: Automating Agentic Workflow Generation
Paper Address: https://arxiv.org/abs/2410.10762
Project Address: https://github.com/geekan/MetaGPT/tree/main/examples/aflow

What is the automatic workflow optimization problem?

Existing Agentic Workflow automatic generation works struggle to generate effective workflows, often requiring manual intervention for initial setup and failing to capture the diversity of workflows needed to complete tasks comprehensively. To overcome these challenges, researchers proposed the AFLOW framework. Utilizing Monte Carlo Tree Search (MCTS) technology to systematically explore and optimize LLM workflows. AFLOW effectively captures the complex interactions between LLM calls by defining workflows as code-representable nodes and edges. By introducing the concept of operators, AFLOW further simplifies the search space and improves search efficiency. Experimental results on multiple benchmark datasets show that AFLOW can automatically discover and optimize workflows, significantly enhancing task execution performance while reducing reliance on manual intervention.

Dynamic demonstration of AFLOW. Achieving automatic generation and optimization of workflows through iterative selection, expansion, evaluation, and backpropagation

AFLOW first reconstructs the workflow optimization problem as a search problem, where workflows are represented as a sequence of coded nodes, each representing a specific operation of the LLM, and the edges between nodes define the logic, dependencies, and execution flow of operations. This representation transforms workflows into a graph structure that can be searched and optimized. Specifically, the workflow W is defined as a sequence of LLM call nodes MetaGPT Open Source Automates Intelligent Agent Workflows , where each node contains four parameters: model M, prompt P, temperature, and output format F (such as xml, json, markdown, raw). Nodes are connected by edges, which can be represented by various structures, such as graphs, neural networks, or code.

The goal of automated workflow optimization is to discover a workflow W that maximizes G(W,T) given a task T and an evaluation function G . This can be expressed as a search process, where the algorithm A explores the search space S to determine the optimal workflow configuration. The search space S includes all possible configurations of node parameters and edge structures.

Node, Operator, and Edge examples. Here we show the optional parameters of Node, common structures of Operator, and common representations of Edge

How does AFLOW automatically optimize workflows?

AFLOW utilizes Monte Carlo Tree Search (MCTS) to automatically generate and optimize Agentic Workflows. In the AFLOW framework, Operators play a crucial role; they are predefined, reusable combinations of nodes representing common agent operations (such as review, vote, generate). These Operators serve as the foundational components for building workflows, integrated into the search space, ensuring that the exploration process can leverage known effective agent operation patterns. The introduction of Operators significantly enhances the search efficiency of the AFLOW framework and the optimization effects of workflows, reducing blind exploration in the vast search space.

The goal of AFLOW is to discover a workflow that maximizes task performance given a task and evaluation function. The AFLOW algorithm starts with an initialized template workflow, which provides a basic workflow framework, including LLM node calls and the use of Operators. Then, the algorithm iteratively performs four main steps of MCTS: Selection, Expansion, Evaluation, and Backpropagation.

Overall framework of AFLOW: By setting a search space consisting of nodes with flexible prompt parameters, a given set of operators, and code representing edges, AFLOW performs MCTS-based searches within this space. Through an MCTS variant designed for workflow optimization, AFLOW iteratively executes a cycle of soft mixed probability selection, LLM-based expansion, evaluation, and empirical backpropagation until reaching the maximum number of iterations or meeting convergence criteria

Selection Phase AFLOW uses a soft mixed probability selection mechanism to choose a node for expansion. This mechanism combines uniform probability distribution and score-based weighted probability distribution to balance exploration and exploitation, avoiding getting stuck in local optima. During the selection process, AFLOW considers the scores of candidate nodes and the need for exploration, thus choosing a node that is likely to bring performance improvement and has exploratory value.

Expansion Phase AFLOW uses LLM as an optimizer to generate new workflows. The optimizer utilizes the experiences of the selected workflow to generate new prompts or modify code to change node connections, thus producing new workflow variants. These new workflow variants are achieved through minor adjustments to existing workflows, such as adding, modifying, or deleting nodes and edges.

Evaluation Phase AFLOW directly executes the generated workflows to obtain feedback. Since reasoning tasks have clear evaluation functions, AFLOW can calculate average scores and standard deviations by running workflows multiple times on validation sets, thus obtaining more accurate optimizer feedback.

Backpropagation Phase The performance information of the workflows is backpropagated into the tree structure of MCTS to update the scores of nodes and guide future search iterations. This information includes the execution results of the workflows and whether optimization was successful relative to their parent workflows. In this way, AFLOW can learn from each iteration and gradually improve the performance of workflows.

To avoid unnecessary costs of continuing execution after optimization has reached its limit, AFLOW will stop the aforementioned iterative process when the top k workflows with priority scores have not improved for several consecutive rounds.

The Transformation Brought by AFLOW to Agentic Workflows

Significant Performance Advantages AFLOW selected six text reasoning tasks covering coding (HumanEval, MBPP), mathematics (GSM8K, MATH), and knowledge question answering (HotpotQA, DROP). Compared to existing manual methods, the average improvement is 5.7%, and it is 19.5% better than other automated methods. In all six tasks, AFLOW shows comprehensive leading advantages, proving its stability and adaptability across different task types.

Performance comparison with other methods. To evaluate the performance of this method, we adopted various metrics on different datasets: solving rates for Math and GSM8K, F1 scores for HotpotQA and DROP, and pass@1 for HumanEval and MBPP. Our AFLOW (highlighted in yellow) consistently outperformed all automated workflow optimization and manually designed methods across all six benchmarks

Significant Cost Reduction The biggest transformation that AFLOW brings to the agent field is its significant cost reduction. Workflows identified by smaller models through AFLOW can achieve equivalent performance at only 4.55% of the reasoning cost of GPT-4o. This breakthrough means that enterprises can achieve the effects of large models with smaller models, providing a cost-effective solution for scaling AI applications.

Cost refers to the total cost of executing the HumanEval test set after segmentation. AFLOW (model) refers to the workflows executed by AFLOW using that model to obtain feedback. The colors in the legend represent different LLMs used to execute workflows on the test dataset

Automation Efficiency Improvement AFLOW completely changes the traditional manual debugging model. Through the automated workflow generation and optimization mechanism, it significantly reduces the need for manual involvement. Developers no longer need to spend a lot of time on repeated debugging and optimization; the system can automatically discover the optimal workflow combinations, greatly shortening the development cycle.

Wide Applicability Experimental results show that AFLOW exhibits excellent transferability. It supports various mainstream LLM models and adapts to different task requirements. In tests across multiple fields such as question answering, code generation, and mathematical problem solving, AFLOW has performed excellently, proving its value as a general optimization framework. Additionally, users can easily apply AFLOW to their tasks by simply providing datasets and Evaluation Functions.

Outlook

AFLOW proposes an effective method for generating Agentic Workflows and comprehensively demonstrates its amazing capabilities in reducing labor and reasoning costs. This research achievement is expected to accelerate the deployment of agents across various fields, transforming the construction process of Agentic Workflows from expert manual construction to automated construction by novices.

Usage

Currently, the authors have open-sourced the complete code on GitHub. Users can quickly search for the best performance or performance-cost balance workflow solutions for personalized tasks by customizing Benchmarks and datasets, helping individuals and enterprises save a lot of time.

AFLOW’s GitHub guide. A step-by-step guide can be followed to configure and run AFLOW for efficient workflow generation and optimization

For reprinting, please contact this public account for authorization

Submission or seeking reporting: [email protected]

Leave a Comment Cancel reply