The AFLOW author team comes from the MetaGPT open-source community. The co-first authors of the AFLOW paper are PhD student Zhang Jiayi from the Hong Kong University of Science and Technology (Guangzhou) and researcher Xiang Jinyu from DeepWisdom, with co-corresponding authors being Wu Chenglin, founder and CEO of DeepWisdom (MetaGPT code author, paper correspondent), and Assistant Professor Luo Yuyu from the Hong Kong University of Science and Technology (Guangzhou). Other authors include Yu Zhaoyang, Teng Fengwei, and Cheng Xin from Renmin University of China, PhD student Chen Xionghui from Nanjing University’s LAMDA Lab, Chen Jiaqi and Zheng Bingnan from Fudan University, PhD student Zhuge Mingchen from King Abdullah University of Science and Technology (co-first author of the MetaGPT paper), DeepWisdom researcher Hong Sirui (co-first author of the MetaGPT paper), and Wang Jinlin, Assistant Professor Liu Bang from the University of Montreal and MILA Lab.
For LLM practitioners, implementing LLM applications and making them effective requires manually constructing and repeatedly debugging the Agentic Workflow, which is undoubtedly a tedious process, modifying similar code repeatedly, debugging prompts, manually executing tests, and observing results. Moreover, switching to another LLM might render previous efforts ineffective, leading to high labor costs. Many companies even hire Prompt Engineers specifically for this task.
Now, the Agentic Workflow also has its own automatic optimization tools.
MetaGPT has open-sourced AFLOW, which uses MCTS for the automatic search of Agentic Workflows, allowing us to fully automate the construction and optimization of Agentic Workflow problems, eliminating the need to manually write code or debug prompts.
AFLOW optimizes workflows through Monte Carlo Tree Search, achieving GPT-4o level capabilities at a very low cost.
This is a further exploration of automatic prompt optimization, fully taking over the generation and optimization process of Agentic Workflows, outperforming other workflow automation efforts, even surpassing all manual workflow baselines compared.
-
Paper Title: AFlow: Automating Agentic Workflow Generation
-
Paper URL: https://arxiv.org/abs/2410.10762
-
Project URL: https://github.com/geekan/MetaGPT/tree/main/examples/aflow
What is the Automated Workflow Optimization Problem?
Current automatic generation of Agentic Workflows struggles to produce effective workflows, often requiring manual intervention for initial setup and failing to capture the diversity of workflows needed to complete tasks comprehensively. To overcome these challenges, researchers proposed the AFLOW framework. Utilizing Monte Carlo Tree Search (MCTS) technology to systematically explore and optimize LLM workflows, AFLOW effectively captures the complex interactions between LLM calls by defining workflows as code-representable nodes and edges. By introducing the concept of operators, AFLOW further simplifies the search space and improves search efficiency. Experimental results on multiple benchmark datasets indicate that AFLOW can automatically discover and optimize workflows, significantly enhancing task execution performance while reducing reliance on manual intervention.
Dynamic demonstration of AFLOW. Achieving automated generation and optimization of workflows through iterative selection, expansion, evaluation, and backpropagation.
AFLOW first reconstructs the workflow optimization problem as a search problem, where workflows are represented as sequences of code-representable nodes, with each node representing a specific operation of the LLM, and the edges defining the logic, dependencies, and execution flow of the operations. This representation transforms workflows into a searchable and optimizable graph structure. Specifically, the workflow W is defined as a sequence of LLM call nodes, where each node
contains four parameters: model M, prompt P, temperature, and output format F (e.g., xml, json, markdown, raw). Nodes are connected by edges, which can be represented by various structures, such as graphs, neural networks, or code.
The goal of automated workflow optimization is to discover a workflow W given a task T and evaluation function G , such that G(W,T) is maximized. This can be phrased as a search process where the algorithm A explores the search space S to determine the optimal workflow configuration. The search space S includes all possible configurations of node parameters and edge structures.
Examples of Node, Operator, and Edge. This shows optional parameters of Node, common structures of Operators, and common representations of Edges.
How Does AFLOW Automatically Optimize Workflows?
AFLOW uses Monte Carlo Tree Search (MCTS) to automate the generation and optimization of Agentic Workflows. In the AFLOW framework, Operators play a critical role; they are predefined, reusable combinations of nodes representing common agent operations (such as review, vote, generate). These Operators are integrated into the search space as foundational components for building workflows, ensuring that the exploration process can leverage known effective agent operation patterns. The introduction of Operators significantly enhances the search efficiency of the AFLOW framework and improves workflow optimization results, reducing blind exploration in the vast search space.
The goal of AFLOW is to discover a workflow that maximizes task performance under given tasks and evaluation functions. The AFLOW algorithm begins with an initialized template workflow, which provides a basic workflow framework, including LLM node calls and the use of Operators. Then, the algorithm iteratively executes four main steps of MCTS: Selection, Expansion, Evaluation, and Backpropagation.
Overall framework of AFLOW: By setting up a search space composed of nodes with flexible prompt parameters, a given set of operators, and code representing edges, AFLOW performs MCTS-based searches within this space. Through a variant of MCTS designed for workflow optimization, AFLOW iteratively executes a loop of soft mixed probability selection, LLM-based expansion, evaluation execution, and experience backpropagation until it reaches the maximum number of iterations or meets convergence criteria.
Selection Phase AFLOW uses a soft mixed probability selection mechanism to choose a node for expansion. This mechanism combines uniform probability distribution and score-based weighted probability distribution to balance exploration and exploitation, avoiding local optima. During the selection process, AFLOW considers the scores of candidate nodes and the need for exploration, selecting a node that is likely to improve performance and has exploration value.
Expansion Phase AFLOW uses LLM as an optimizer to generate new workflows. The optimizer utilizes the experience of the selected workflow to generate new prompts or modify code to change node connections, thus producing new workflow variants. These new workflow variants are achieved through minor adjustments to existing workflows, such as adding, modifying, or deleting nodes and edges.
Evaluation Phase AFLOW directly executes the generated workflows to gain feedback. Since reasoning tasks have clear evaluation functions, AFLOW can calculate average scores and standard deviations by running workflows multiple times on a validation set, obtaining more accurate feedback for the optimizer.
Backpropagation Phase The performance information of the workflows is backpropagated to the tree structure of MCTS to update the scores of nodes and guide future search iterations. This information includes the execution results of workflows and whether optimization was successful relative to their parent workflows. In this way, AFLOW can learn from each iteration and gradually improve the performance of workflows.
To avoid unnecessary costs from continuing execution after optimization reaches its limit, AFLOW will stop the above iterative process when the top k workflows prioritized by score have not improved for several consecutive rounds.
Transformations Brought by AFLOW to Agentic Workflows
Significant Performance Advantages AFLOW selected six text reasoning tasks covering coding (HumanEval, MBPP), mathematics (GSM8K, MATH), and knowledge Q&A (HotpotQA, DROP). Compared to existing manual methods, it achieved an average improvement of 5.7%, and a remarkable 19.5% improvement over other automated methods. In all six tasks, AFLOW demonstrated comprehensive leading advantages, proving its stability and adaptability across different task types.
Performance comparison with other methods. To evaluate the performance of this method, we adopted various metrics across different datasets: solving rates for Math and GSM8K, F1 scores for HotpotQA and DROP, and pass@1 for HumanEval and MBPP. Our AFLOW (highlighted in yellow) consistently outperformed all automated workflow optimization and manually designed methods across all six benchmarks.
Significant Cost Reduction The greatest transformation AFLOW brings to the Agent field is its significant cost reduction. Workflows identified by smaller models through AFLOW can achieve equivalent performance at only 4.55% of the inference cost of GPT-4o. This breakthrough means that enterprises can achieve large model effects with smaller models, providing an economically feasible solution for the large-scale deployment of AI applications.
Cost refers to the total expense of executing the HumanEval test set after segmentation. AFLOW (model) refers to the execution of workflows using this model to obtain feedback. The colors in the legend represent different LLMs used to execute workflows in the test datasets.
Increased Efficiency through Automation AFLOW fundamentally changes the traditional manual debugging model. Through automated workflow generation and optimization mechanisms, it significantly reduces the need for human involvement. Developers no longer need to spend a lot of time on repeated debugging and optimization; the system can automatically discover the optimal workflow combinations, greatly shortening the development cycle.
Wide Applicability Experimental results indicate that AFLOW demonstrates excellent transferability. It not only supports various mainstream LLM models but also adapts to different task requirements. In tests across multiple areas such as Q&A, code generation, and math problem-solving, AFLOW performed excellently, proving its value as a general optimization framework. Moreover, users can easily apply AFLOW to their tasks by simply providing datasets and evaluation functions.
Outlook
AFLOW proposes an effective method for generating Agentic Workflows and comprehensively demonstrates its remarkable ability to reduce labor and inference costs. This research achievement is expected to accelerate the deployment of Agents across various fields, transforming the construction process of Agentic Workflows from manual construction by experts to automated construction by novices.
Usage
Currently, the authors have open-sourced the complete code on GitHub. Users can quickly search for optimal performance or performance-cost balance workflow solutions for personalized tasks by customizing benchmarks and datasets, helping individuals and enterprises save a considerable amount of time.
AFLOW’s GitHub guide. You can refer to the step-by-step guide to configure and run AFLOW for efficient workflow generation and optimization.
Scan the QR code to add the assistant WeChat