MLNLP community is a well-known machine learning and natural language processing community both domestically and internationally, covering NLP master’s and doctoral students, university teachers, and corporate researchers.

The vision of the community is to promote communication and progress between the academic and industrial circles of natural language processing and machine learning at home and abroad, especially for the progress of beginners.

Reprinted from | Machine Heart

The AFLOW author team comes from the MetaGPT open-source community. The first authors of the AFLOW paper are PhD student Zhang Jiayi from the Hong Kong University of Science and Technology (Guangzhou) and researcher Xiang Jinyu from DeepWisdom, with corresponding authors being Wu Chenglin, the founder and CEO of DeepWisdom (MetaGPT code author, paper corresponding author), and Assistant Professor Luo Yuyu from the Hong Kong University of Science and Technology (Guangzhou). The authors also include Yu Zhaoyang, Teng Fengwei, and Cheng Xin from Renmin University of China, PhD student Chen Xionghui from Nanjing University’s LAMDA Laboratory, Chen Jiaqi and Zheng Bingnan from Fudan University, PhD student Zhuge Mingchen from King Abdulaziz University (co-first author of the MetaGPT paper), DeepWisdom researcher Hong Sirui (co-first author of the MetaGPT paper), and Wang Jinlin, Assistant Professor Liu Bang from the University of Montreal and MILA Laboratory.

For LLM practitioners, applying LLMs effectively requires manually constructing and repeatedly debugging Agentic Workflows, which is undoubtedly a cumbersome process, involving repeatedly modifying similar code, debugging prompts, manually executing tests, and observing results. Switching to a different LLM may render previous efforts ineffective, leading to high labor costs. Many companies even hire dedicated Prompt Engineers to accomplish this task.

Now, Agentic Workflow has its own automatic optimization tool.

MetaGPT has open-sourced AFLOW, which uses MCTS for the automatic search of Agentic Workflows, allowing for the complete automatic construction and optimization of Agentic Workflow problems without the need for manual coding or prompt debugging.

AFLOW optimizes workflows through Monte Carlo Tree Search, achieving GPT-4o level capabilities at a very low cost.

This is a further exploration of automatic prompt optimization, which completely takes over the generation and optimization process of Agentic Workflows, outperforming other workflow automation efforts and even surpassing all manual workflow baselines in comparison.

Paper Title: AFlow: Automating Agentic Workflow Generation
Paper Link: https://arxiv.org/abs/2410.10762
Project Link: https://github.com/geekan/MetaGPT/tree/main/examples/aflow

What is the Automatic Workflow Optimization Problem?

Existing methods for automatically generating Agentic Workflows struggle to produce effective workflows, often requiring manual intervention during initial setup and failing to capture the diversity of workflows needed to complete tasks comprehensively. To overcome these challenges, researchers proposed the AFLOW framework. It utilizes Monte Carlo Tree Search (MCTS) technology to systematically explore and optimize LLM workflows. AFLOW effectively captures the complex interactions between LLM calls by defining workflows as code-representable nodes and edges. By introducing the concept of operators, AFLOW further simplifies the search space and improves search efficiency. Experimental results on multiple benchmark datasets indicate that AFLOW can automatically discover and optimize workflows, significantly improving task execution performance while reducing reliance on manual intervention.

Dynamic demonstration of AFLOW. It achieves automated generation and optimization of workflows through iterative selection, expansion, evaluation, and backpropagation.

AFLOW first reconstructs the workflow optimization problem as a search problem, where workflows are represented as sequences of coded nodes, each representing a specific operation of the LLM. The edges between nodes define the logic, dependencies, and execution flow of the operations. This representation transforms workflows into a graph structure that can be searched and optimized. Specifically, the workflow W is defined as a sequence of LLM calling nodes, where each node MetaGPT Open-Source Auto-Generated Agentic Workflow contains four parameters: model M, prompt P, temperature, and output format F (such as xml, json, markdown, raw). Nodes are connected by edges, which can be represented by various structures, such as graphs, neural networks, or code.

The goal of automated workflow optimization is to discover a workflow W given a task T and an evaluation function G that maximizes G(W,T) . This can be formulated as a search process in which algorithm A explores the search space S to determine the optimal workflow configuration. The search space S includes all possible configurations of node parameters and edge structures.

Examples of Node, Operator, and Edge. This shows the optional parameters of Node, common structures of Operator, and common representations of Edge.

How Does AFLOW Automatically Optimize Workflows?

AFLOW utilizes Monte Carlo Tree Search (MCTS) to automate the generation and optimization of Agentic Workflows. In the AFLOW framework, operators play a crucial role; they are predefined, reusable combinations of nodes that represent common agent operations (such as review, vote, generate). These operators serve as foundational components for constructing workflows and are integrated into the search space, ensuring that the exploration process can leverage known effective agent operation patterns. The introduction of operators significantly enhances the search efficiency of the AFLOW framework and the optimization of workflows, reducing blind exploration in the vast search space.

The goal of AFLOW is to discover a workflow that maximizes task performance given a task and evaluation function. The AFLOW algorithm starts with initializing a template workflow that provides a basic framework, including LLM node calls and the use of operators. Then, the algorithm iteratively proceeds through the four main steps of MCTS: Selection, Expansion, Evaluation, and Backpropagation.

Overall framework of AFLOW: By setting a search space composed of nodes with flexible prompt parameters, a given set of operators, and code representing edges, AFLOW performs MCTS-based searches within this space. Through a variant of MCTS designed for workflow optimization, AFLOW iteratively executes soft mixed probability selection, LLM-based expansion, evaluation, and experience backpropagation cycles until reaching the maximum iteration count or meeting convergence criteria.

Selection Phase AFLOW uses a soft mixed probability selection mechanism to choose a node for expansion. This mechanism combines uniform probability distribution and score-based weighted probability distribution to balance exploration and exploitation, avoiding local optima. During the selection process, AFLOW considers the scores of candidate nodes and the need for exploration, choosing a node that is likely to improve performance and has exploration value.

Expansion Phase AFLOW uses LLM as an optimizer to generate new workflows. The optimizer leverages the experience of the selected workflow to generate new prompts or modify connections between nodes to create new workflow variants. These new workflow variants are achieved through slight adjustments to existing workflows, such as adding, modifying, or deleting nodes and edges.

Evaluation Phase AFLOW directly executes the generated workflows to obtain feedback. Since reasoning tasks have clear evaluation functions, AFLOW can calculate average scores and standard deviations by running workflows multiple times on the validation set, obtaining more accurate feedback for the optimizer.

Backpropagation Phase The performance information of the workflow is backpropagated into the tree structure of MCTS to update node scores and guide future search iterations. This information includes the execution results of the workflow and whether optimization was successful compared to its parent workflow. In this way, AFLOW can learn from each iteration and gradually improve workflow performance.

To avoid unnecessary costs after optimization reaches its limit, AFLOW will stop the above iterative process when the top k workflows by score have not improved over several consecutive rounds.

Transformations Brought by AFLOW to Agentic Workflows

Significant Performance Advantages AFLOW selected six text reasoning tasks covering code (HumanEval, MBPP), mathematics (GSM8K, MATH), and knowledge Q&A (HotpotQA, DROP). Compared to existing manual methods, it averaged a 5.7% improvement and achieved a 19.5% enhancement over other automation methods. Across all six tasks, AFLOW demonstrated comprehensive advantages, proving its stability and adaptability across different task types.

Performance comparison with other methods. To evaluate the performance of this method, we adopted various metrics across different datasets: solving rates for Math and GSM8K, F1 scores for HotpotQA and DROP, and pass@1 for HumanEval and MBPP. Our AFLOW (highlighted in yellow) consistently outperformed all automated workflow optimization and manually designed methods across all six benchmarks.

Significant Cost Reduction The greatest transformation brought by AFLOW to the Agent domain is its significant cost reduction. Workflows identified by smaller models through AFLOW achieve comparable performance at only 4.55% of the reasoning cost of GPT-4o. This breakthrough means that enterprises can achieve the effects of large models with smaller models, providing a cost-effective solution for the large-scale deployment of AI applications.

Cost refers to the total expense of executing the segmented HumanEval test set. AFLOW (model) refers to the workflow executed by AFLOW using that model to obtain feedback. The colors in the legend represent different LLMs used to execute workflows in the test dataset.

Efficiency Improvement through Automation AFLOW fundamentally changes the traditional manual debugging model. Through automated workflow generation and optimization mechanisms, it significantly reduces the need for manual involvement. Developers no longer need to spend a lot of time on repeated debugging and optimization, as the system can automatically discover optimal workflow combinations, greatly shortening the development cycle.

Wide Applicability Experimental results indicate that AFLOW exhibits excellent transferability. It not only supports various mainstream LLM models but also adapts to different task requirements. In testing across multiple fields, including Q&A, code generation, and mathematical problem-solving, AFLOW has performed excellently, proving its value as a general optimization framework. Additionally, users can easily apply AFLOW to their tasks by simply providing datasets and evaluation functions.

Outlook

AFLOW proposes an effective method for generating Agentic Workflows and comprehensively demonstrates its astonishing capabilities in reducing labor and reasoning costs. This research achievement is expected to accelerate the implementation of agents in various fields, transforming the construction process of Agentic Workflows from manual construction by experts to automated construction by novices.

Usage

Currently, the authors have open-sourced the complete code on GitHub. Users can quickly search for the best performance or performance-cost balance workflow solutions for personalized tasks by customizing benchmarks and datasets, helping individuals and enterprises save a lot of time.

AFLOW’s GitHub guide. You can refer to the step-by-step guide to configure and run AFLOW for efficient workflow generation and optimization.

Technical Exchange Group Invitation

△ Long press to add assistant

Scan the QR code to add the assistant WeChat

Please note: Name-School/Company-Research Direction

(e.g., Xiao Zhang-Harbin Institute of Technology-Dialogue System)

to apply to join the Natural Language Processing/PyTorch and other technical exchange groups

About Us

MLNLP community is a grassroots academic community jointly built by machine learning and natural language processing scholars at home and abroad. It has developed into a well-known machine learning and natural language processing community, aiming to promote the progress between the academic and industrial circles of machine learning and natural language processing.

The community can provide an open communication platform for related practitioners’ further education, employment, and research. Everyone is welcome to follow and join us.