Trends in AI Agent Workflows: Full Transcript of Andrew Ng's Speech

Source | BLUES, Intelligent Gorilla

This article is the speech given by Andrew Ng in March this year, titled “Agentic Reasoning”, which explains the trends in AI agent workflows.

This article translates AI agents as AI proxies.

This article is for academic/technical sharing only. If there is any infringement, please contact us to delete the article.

Andrew Ng pointed out that with the development of AI technology, AI proxies are seen as tools that can significantly enhance software development efficiency and quality.

He reinforced this point by demonstrating how AI proxies can surpass the limitations of individual models in proactive workflows, and how multi-agent systems can collaborate to solve complex problems.

He believes that in the future, we will see AI proxies playing a greater role in various workflows, and the performance and application scope of these proxies will continue to expand.

In this regard, people need to learn to reset their expectations for collaborating with AI and fully leverage the advantages of AI proxies for rapid iteration.

Andrew Ng also predicted the expansion of AI workflows and discussed how humans need to adapt to new ways of interacting with AI proxies.

Andrew Ng introduced four design patterns for AI intelligent agent workflows:

Reflection: LLM checks its own work to propose improvements.
Tool Use: LLM has functionalities like web search, code execution, or any other capabilities to help it gather information, take action, or process data.
Planning: LLM proposes and executes a multi-step plan to achieve a goal (e.g., drafting an outline, conducting online research, and then writing a draft…).
Multi-Agent Collaboration: Multiple AI intelligent agents work together, assigning tasks and discussing and debating ideas to propose better solutions than a single agent.

Key Conclusions and Supporting Arguments:

Conclusion 1: AI proxies in proactive workflows can yield better results than traditional workflows.

Argument:

The effects observed during Andrew Ng’s personal operations were surprising.

GPT-3.5 performed even better in proactive workflows than GPT-4, despite GPT-4 having a higher success rate in zero-shot prompts.

Conclusion 2: Multi-agent collaboration is an effective strategy for enhancing AI performance.

Argument:

The design patterns include collaboration between code agents and critique agents, which are easy to implement and universal.

Experiments show that multi-agents can generate complex programs through collaborative coding, testing, and iteration.

Conclusion 3: The application of AI proxies will expand the range of tasks that artificial intelligence can execute.

Argument:

Andrew Ng anticipates that due to the existence of agent workflows, the set of tasks that AI can accomplish this year will be significantly expanded.

The existing different design patterns (Reflection, Tool Use, Planning, and Multi-Agent Collaboration) indicate that the applications of AI proxies are becoming increasingly refined and widespread.

Conclusion 4: Rapid iteration is key in the use of AI proxies, and may even produce better results than slower iterations of higher quality models.

Argument:

LLM’s rapid generation of Tokens is crucial for multi-agent workflows as constant iteration is required.

Even lower-quality LLMs, as long as the iteration speed is fast enough, may produce better results than slower-generated models with higher quality.

Conclusion 5: People need to adapt to patiently waiting for AI proxies to complete tasks.

Argument:

Currently, people are accustomed to the instant responses of search engines, but collaboration with AI proxies requires time to produce optimal results.

This adaptation is a necessary step to improve efficiency in collaborating with AI proxies.

Below is the full text of the speech:

“Agentic Reasoning”

I am excited to share my views on AI proxies, which I believe is an exciting trend that everyone involved in AI development should pay attention to, and I am also excited about other content.

What will the upcoming presentation be like? Most of us currently use AI proxies in a way similar to a small business, with a non-proactive workflow where you input a question and it generates an answer. It is somewhat like asking a person to write a paper and then saying, “Please sit at the keyboard, type this paper from start to finish, and do not use the backspace key.”

Although this is difficult, LLMs actually do quite well. In contrast, in a proactive workflow, the situation might be: AI says, “Write an outline for a paper. Do you need to do further research? Let’s get started. Then write the first draft, read your own first draft, consider which parts need to be modified, and then revise your draft, repeating this process.”

This workflow is more iterative, and you may need AI to do some thinking, then modify the article, do some thinking again, and repeat this process multiple times. What many people do not realize is that this actually yields significantly better results.

I was also very surprised by the effects of these proactive workflows when I operated them myself.

Unless you want a case study, my team uses a code standard called “Human Intelligence Benchmark” to analyze data, a standard released by OpenAI a few years ago.

This contains coding problems, such as “Given a non-empty integer list, return the sum of all odd elements or elements at odd indices.” The result is your code snippet looks like this.

Today, many of us will use zero-shot prompts, meaning we tell AI to “write code” and let it run in the first part, like who called the code. I know humans would not write code like that; we should type out the code. Maybe you can do it, but I cannot. It turns out that if you use GPT-3.5 with zero-shot prompts, its success rate is 48%, while GPT-4 performs much better, with a success rate of 67.7%.

But if you adopt a proactive workflow around GPT-3.5, it actually performs even better than GPT-4. If you surround this type of workflow around GPT-4, it also performs well.

You will notice that in proactive workflows, GPT-3.5 actually performs better than GPT-4. I think this has very important implications, and I believe it will change how all of us build applications. So the term “agent” is widely discussed, and many consulting reports are talking about the future of AI and so on.

I want to share more specifically the broad design patterns and proxies I have observed. This is a very chaotic and disorderly field with a lot of research and open-source projects. There is a lot happening, but I am trying to categorize it more specifically. After experiencing the tool of agent reflection, I believe many of us are using it, and it does work effectively. I think this is a recognized technology. When I use them, I can almost always get them to work properly.

Planning and multi-agent collaboration, I think these are more emerging content. When I use them, sometimes I am shocked by their effects. But at least for now, I feel I cannot always reliably get them to work.

So let me introduce these four design patterns through a few slides. If some of you go back and have your engineers use them, I believe you will quickly gain productivity.

So regarding reflection, here is an example. Suppose I ask a system to write code for a specific task. Then we have a code agent, just a LLM that you prompt to write code, like defining a task function.

An example of self-reflection is if you subsequently ask the LLM with a similar prompt, for example, here is a piece of code prepared for a task, then feed the exact same code back to it and carefully check the correctness and efficiency of the code.

For them, this is a good constructive feedback. The results indicate that the same LLM may find and fix a problem in line five after you prompt it to write code. If you now take its feedback to prompt it, it may generate a second version of the code that works better than the first version, although it is not guaranteed, but it happens often enough to be worth trying in many applications.

To hint at what is to come. If you let it run unit tests, if it fails the unit tests, then you ask it why it failed the unit tests?

Engaging in such dialogues. We will find out why it did not pass the unit tests, so try to change something and ultimately propose a solution.

By the way, for those who want to learn more about these technologies, I have a high regard for each part, and at the bottom, there is a recommended reading section, the entire chart is more reference material.

I guess the agent system I described is a single code agent that you prompt?

A natural evolution of this idea is that instead of a single code agent, you could have two agents, one being a code agent and the other a critique agent. These might be the same underlying LLM model, but you prompt them in different ways.

We say one is an expert coder writing code.

The other says you are an expert code reviewer reviewing the code.

This workflow is actually quite easy to implement.

I think this is a very universal technology, and it will significantly enhance your LLM performance for many workflows.

Second Design Pattern: Tool Use

Some of you may have seen LLM systems using tools; on the left is a screenshot of Copilot, and on the right is something extracted from GPT-4.

But today’s LLM, if you ask, what is the best coffee machine?

You can conduct a web search, for certain questions, LLM will generate code and run it.

It turns out that many different tools are used by many different people to analyze, gather information to take action, and improve personal productivity.

It turns out that many early tools were used in the computer vision community because before large language models appeared, they could not do anything with images.

Therefore, the only option is to let the LLM generate a function call that can operate on images, such as generating images or conducting object detection. Therefore, if you really look at the literature, it is interesting that most of the work in tool use seems to stem from vision. Because LLMs were previously blind to images, GPT-4v, and Lava, etc. This is tool use, which expands the capabilities of LLM.

Third: Planning

For those who have not really played with many planning algorithms, I think many people are talking about the ChatGPT moment, and you will find, wow, it is unimaginable that an AI proxy can do this.

I think if you have not used planning algorithms, many people will have an AI proxy. I cannot imagine an AI proxy doing this.

So I ran a live demonstration where some failures occurred, and the AI proxy replanned around the failures. In fact, I have had many such moments, wow, I can’t believe my AI system can do this autonomously.

I adapted an example from a paper embracing GPT, where you say, please generate an image of a girl. The girl is reading, and the content it publishes is the same as the boy in the image, for example, .jpeg, please describe the new image with a boy.

Today, with AI proxies, you can make decisions. The first thing I need to do is determine the boy’s posture. Then, find the right model, maybe to extract the posture on huggingFace. Next, you need to find a post-image model to synthesize an image of the girl as shown, then use image-to-text, and finally use text-to-speech.

Today, we actually have some proxies. I don’t want to say they work reliably; they are a bit picky, and they do not always work, but when they do work, it is actually quite amazing. Through the agent loop, sometimes you can also recover from early failures.

So I find myself using research proxies in some of my work. I want to conduct research, but I do not like Googling myself and spending a long time. I should send it to researchers and check back a few minutes later to see what results there are. And sometimes it works, and sometimes it doesn’t, right? But this has already become part of my personal workflow.

4. Multi-Agent Collaboration

The final design pattern, multi-agent collaboration.

One of the interesting things is that it works much better than you might think.

On the left is a screenshot from a paper titled “Chat”, which is completely open and actually open source. Many of you have seen the gorgeous social media announcement of the open-source demonstration of Chat Dev running on my laptop.

ChatDev is an example of a multi-agent system where you prompt an LLM, sometimes playing the role of a software engine company’s CEO, sometimes a product manager, and sometimes a tester.

By prompting agents to tell them, you are now the CEO, you are now the software engineer to build group agents. They will actually spend a few minutes writing code? Testing it. Iterating. Then generating an extremely complex program.

They collaborate through a long dialogue, so if you tell it, please develop a game, develop goals, more key games. They will actually spend a few minutes writing code, testing it, scoring it, and then generating a very complex program.

Sometimes it does not work, and sometimes it is amazing. But this technology is really getting better. And this is just one of the design patterns; it turns out that multi-agent debate. You have different agents, for example, possibly having ChatGPT and Gemini debate each other, which also brings better performance. Having multiple agents work together is also a more powerful design pattern.

So to summarize, I think these patterns are obvious. I believe if we use these patterns in our work, many of us can quickly gain improvements.

I believe that the design patterns of agent reasoning will be important. This is my last slide.

I anticipate that due to agent workflows, the set of tasks that artificial intelligence can accomplish this year will expand significantly.

One thing that people actually find hard to adapt to is that when we prompt LLM, we expect an immediate response.

In fact, ten years ago, when I was discussing this issue with Google, we called it a big box search type, and we needed to input a long prompt, which is one of the reasons I failed to push it successfully, because when you do a web search, you expect a response in half a second, right? That is human nature.

We like that instant grabbing, instant feedback.

But for many agent processes, I think we need to learn to delegate tasks to AI proxies and patiently wait a few minutes, or even hours, for a response.

But just like I see many novice managers delegate things to others and then check back after 5 minutes, right?

This is not efficient. I think this is really difficult.

We also need to do this with some AI proxies.

Another important trend is that rapid Token generation is crucial because through these agent workflows, we will iterate over and over again.

Therefore, LLM is generating Tokens. Thus, being able to generate Tokens faster than anyone else can read is really great.

I think that even slightly lower quality LLMs generating more Tokens quickly may yield good results compared to slower Tokens from better LLMs.

Perhaps this is a bit controversial, as it may allow you to bypass this loop multiple times. It is somewhat like the results I used to demonstrate with GPT-3 and the agent architecture on the first slide.

To be honest, I am really looking forward to Claude 5, Claude 4, GPT-5, and Gemini 2.0 and all the other amazing models that many are building.

Part of me feels that if you expect to run your stuff on GPT-5 Zero-Shot, you know, you might be able to get closer to that performance level on some applications than you would imagine through agent reasoning, but on earlier models. I feel this is an important trend, and honestly, the road to AGI feels like a journey, not a destination.

But I believe that these types of agent workflows can help us take a small step forward on this long journey. Thank you.

Additional Information:

How to Understand: AI Agent

AI agent generally refers to a software entity that can simulate certain aspects of human intelligence to some extent, perform specific tasks or achieve goals. These agents can perceive their environment and make decisions and take actions based on that, thus completing established tasks or solving problems.

To understand AI agents, we can compare them to an assistant capable of independently executing tasks. For example, consider the coding agent mentioned by Andrew Ng. Suppose you are a programmer who needs to write a function to process data.

If there is no AI agent, you need to think, code, test, and debug by yourself. With an AI agent, you may only need to describe the result you want, and the AI agent will automatically generate the code and may even test and optimize it.

For multi-agent systems, imagine a team composed of several specialized AI agents, each with different roles and capabilities. Here are some examples:

Software Development Multi-Agent System: You might have one AI agent responsible for writing code (coding agent), another responsible for checking code quality (review agent), and another that may focus on writing test cases (testing agent). These agents can work together, communicate with each other, and jointly develop a fully functional software application.
Customer Service Bots: One AI agent is responsible for answering calls, analyzing customer issues, and assigning them to the most suitable service department. Another AI agent may specialize in resolving specific types of problems, such as technical support or order processing. These agents can work together based on customer needs to provide effective customer service.
Personal Assistant AI: One AI agent helps you manage your calendar and schedule meetings; another agent helps you search for information on the web, while a third AI agent may be responsible for running simulations and predictions to help you make better business decisions.

By using AI agents, we can automate complex processes, increase efficiency, and allow systems to perform tasks that typically require a lot of time and expertise in an automated manner. The advancements in AI agents also mean that they can learn and improve their performance, becoming more precise and efficient over time.

END

Trends in AI Agent Workflows: Full Transcript of Andrew Ng’s Speech

Second Design Pattern: Tool Use

Third: Planning

Leave a Comment Cancel reply