Agent vs. GPT-5: Andrew Ng's Insights on Four Agent Design Paradigms

Professor Andrew Ng recently shared his views on Agents at the Sequoia AI Summit. Although some media outlets have reported on this, they sacrificed accuracy for the sake of timeliness by using machine translation, which increased unnecessary reading barriers.

The Agent Universe has reorganized and translated a version that retains Professor Ng’s original intent while adding some personal interpretations. It is hoped that even laypeople can read it without barriers.

However, my abilities are limited, so if you have any questions or suggestions, feel free to join our Agent enthusiast community for discussion. Here are the highlights from the talk 👇

Nowadays, when using AI tools like ChatGPT, we typically input a prompt and receive an answer. This is somewhat akin to giving a topic to someone and telling them to write an article, saying, ‘Sit by the computer and type until you finish.’

In contrast, using Agentic Workflow (which is difficult to translate elegantly, let’s assume it’s an intelligent agent system based on large language models) is like telling that person to first write an outline, check online for some information if needed, draft a text, reflect on how to revise it, and then edit it, iterating multiple times. Many people do not realize the optimization this brings; in fact, I often do this and the results are quite impressive.

Our team conducted a case study using HumanEval (a dataset designed by OpenAI to evaluate programming language models), but some errors occurred. For example, I posed this question: ‘Given a list of numbers, find the numbers at odd positions and return the sum of all odd numbers,’ and the AI gave an incorrect answer.

We usually use Zero-shot (not providing the large model with specific training samples or label hints, directly asking it to respond) to write prompts, which means directly letting the AI write and run code (this is not a wise approach).

Our research results show that if you use GPT-3.5 + Zero-shot, the accuracy is 48%, while GPT-4 + Zero-shot achieves 67%. However, if you use GPT-3.5 + Agentic Workflow, you will achieve results that surpass GPT-4! Therefore, Agents are very important in building AI applications.

(Now we get to the main topic) Although many scholars and experts have discussed various aspects of Agents, I want to specifically share four widely recognized design patterns I see in Agents (even though many teams, open-source projects, etc., have made various attempts, I still categorize them based on my understanding).

Reflection and Tool Use are relatively classic and widely used methods, while Planning and Multi-agent are newer and more promising approaches.

The first is Reflection (self-correction and iteration of the AI). For example, we let an AI system built with Reflection write some code, and the AI will add phrases like ‘check the correctness of this code and tell me how to modify it’ before returning it to itself. The AI might point out bugs, and this process repeats, allowing the AI to self-iterate. Although the quality of the modified code cannot be guaranteed, the overall effect tends to be better.

(At the bottom of each PPT page, Professor Ng recommended several related papers for further reading)

The above example describes a Single-agent case (as opposed to Multi-agent), but you can also use two Agents: one writes code and the other debugs it 👇

These two Agents can use the same LLM or different ones; this Reflection method is applicable in many scenarios.

The second is Tool Use (which should be familiar if you often use GPT-4 or some domestic AI dialogue products). The large language model’s ability is greatly expanded by calling plugins.

(This part is discussed less frequently) Currently, a common use is to utilize Copilot for online searches or to call code plugins for assistance in solving certain logical problems.

The third is Planning, a very impressive design. The user inputs a task, and the AI breaks it down into processes, selects tools, calls, executes, and outputs results. When I was doing some demos, I encountered some errors, but the Agent bypassed my mistakes and autonomously completed the task.

Here’s an example adapted from the HuggingGPT paper: I need to generate an image of a girl reading a book, and her posture should be the same as the boy in the provided image, then describe this article in words.

The Agent’s approach is to first extract the boy’s posture from the image (possibly calling a model from Huggingface), then find a model to generate an image with the same posture, and finally describe the generated image.

The Agent’s effectiveness is not guaranteed to be perfect, but it is often efficient. For instance, I used to spend a lot of time on Google searches; now I just throw a question to the Agent and check back later for its reply.

The last one is Multi-agent, multi-agent collaboration (Professor Ng’s example here comes from Tsinghua University’s open-source project ChatDev).

Each Agent is assigned a different identity, such as a CEO, product manager, or programmer, and they cooperate and communicate with each other. For example, if you ask them to develop a simple game, they will spend a few minutes coding and testing. Although it may not always be very effective, it is very promising and imaginative, simulating real-life work scenarios. Multi-agent systems can execute not just single tasks but become complex systems.

In conclusion, I believe that in the future, thanks to Agentic Workflow, AI will be able to create more impressive applications. However, currently, waiting for Agent responses takes a considerable amount of time, so faster token generation speed is crucial (Professor Ng shared a story here, expressing that human nature desires instant gratification).

One important point is that if you are expecting more powerful models like GPT-5, you can actually achieve similar and better results now using Agents. This may be controversial, but Agents are indeed an important trend.

Finally, Professor Ng elevated the theme 👇

Path to AGI feels like a journey rather than a destination, but I think agentic workflow could help us take a small step forward on this very long journey.

The path to artificial general intelligence feels like a journey rather than an end point, but I believe that Agents can help us take small yet solid steps along this long journey.

Agent vs. GPT-5: Andrew Ng’s Insights on Four Agent Design Paradigms

Leave a Comment Cancel reply