Simplifying Complexity: Principles for Building Efficient and Reliable AI Agents

Definition of AI Agent

When it comes to agents, many people think it is a product of LLMs, but that is not the case.

The modern definition of AI agents has gradually formed alongside the development of AI since the 1950s. Its roots can be traced back to earlier philosophical thoughts and scientific explorations.

In 1950, Alan Turing introduced the concept of a “highly intelligent organism” in his paper “Computing Machinery and Intelligence” and proposed the famous Turing test.

In 1972, American scholar Marvin Minsky established the Artificial Intelligence Laboratory at MIT and formally proposed the concept of “agents”, believing that AI agents should have the ability to perceive the world, reason, and execute actions.

Various definitions of the term “AI agent” have been proposed in academia and industry. Among them, OpenAI defines AI agents as “systems driven by large language models as their brain, capable of autonomous understanding, perception, planning, memory, and tool usage, able to automate the execution of complex tasks.“

AI agent systems are divided into two categories: Workflows and Agents.

Workflows: refer to systems that orchestrate LLMs and tools through predefined code paths. It is more like a carefully designed process where each step is clear and controllable.
Agents: are systems that dynamically guide their processes and tool usage through LLMs. They are more like autonomous decision-makers that can flexibly adjust actions based on environmental feedback.

The key difference between the two lies in whether the LLM can dynamically control its processes and tool usage.

What is the relationship between AI agents and LLMs?

The relationship between AI agents and LLMs (Large Language Models) can be simply understood as large models being the premise and foundation for AI agents. If we liken AI agents to organisms and LLMs to their brains, AI agents have hands and feet to work and execute, while LLMs are the brains.

For example, you have an AI chef—an AI agent.

If you only use an AI large model, it can only provide you with a recipe, telling you what ingredients and steps are needed to make a dish.
However, using an AI agent, it can not only provide the recipe, but also help you choose the most suitable ingredients based on your taste preferences and nutritional needs, even automatically placing orders, monitoring the cooking process, ensuring the quality and taste of the food, ultimately serving you a dish that looks, smells, and tastes great.

Current LLMs may have some issues, such as generating hallucinations, results not always being truthful or reliable, or having limited knowledge of current events, which makes them seem inadequate in handling complex tasks. However, AI agents can integrate autonomous verification and decision-making processes, not only compensating for these shortcomings, ensuring the accuracy and efficiency of actions but also making the entire system more reliable and efficient when facing complex tasks.

How AI Agents Work

The core decision-making mechanism of AI agents revolves around dynamic adaptation and continuous optimization. With the help of large language models (LLMs), agents can flexibly choose and execute suitable action strategies based on real-time perceived changes in environmental information, while precisely evaluating and judging the results of actions. This process evolves through multiple iterations, each based on an in-depth understanding of the environment and feedback from the previous execution, gradually approaching and ultimately achieving the set goals. This operational mode ensures that agents remain efficient, flexible, and adaptive in complex and changing environments, continuously driving tasks towards success.

Simplified Decision-Making Process of AI Agents

Perception: The ability of agents to collect information from the environment and extract relevant knowledge. For example, in a smart customer service scenario, perceiving the user’s problem description, language style, and the context of the questions.
Planning: The decision-making process made to achieve a certain goal. For example, planning the response approach based on the user’s question, determining the tools or knowledge resources that need to be invoked, etc.
Action: The actions taken based on the environment and planning. It could be answering the user’s question, performing specific tasks (such as querying databases, invoking external APIs, etc.), or interacting with other agents.

Core Modules in AI Agent Engineering Implementation

The engineering implementation of AI agents is the foundation of their intelligent behavior. It typically includes key components such as perception, planning, memory, tool usage, and action, which work together to achieve efficient intelligent behavior.

Simplifying Complexity: Principles for Building Efficient and Reliable AI Agents

Common Core Modules in AI Agents

Planning: Using logic and algorithms to process and analyze information to make decisions. For example, determining the type of problem, key information, and possible solutions through reasoning when dealing with complex issues.

Memory: Includes short-term memory and long-term memory, used to store and retrieve information. Short-term memory helps the agent remember key information during the current interaction, while long-term memory is used to accumulate knowledge and experience for reference in subsequent tasks.

Tools: Various tools and resources that agents can invoke, such as calculators, search engines, databases, etc., to extend their functionality. For example, when performing data analysis, invoking data analysis tools to process and visualize data.

Action: Executing specific operations, such as sending messages, executing code, controlling devices, etc., to achieve goals.

Composition of AI Agent Technology Stack

AI agents are not just large models that can chat; they are more like intelligent agents with a certain degree of autonomy. They need to manage their own state (e.g., conversation history and memory), invoke various tools, and execute safely. This makes the technology stack of AI agents significantly different from that of traditional LLMs.

Let us analyze the key components of the AI agent technology stack from bottom to top:

Model Serving: The Brain of AI

Core: LLM. This is the core driving force of AI agents.
Service Method: Services are provided through inference engines, usually paid/self-deployed APIs.
Main Players:

Closed-source Models: OpenAI and Anthropic lead the way.
Open-source Models: Providers like Together.AI, Fireworks, and Groq are beginning to emerge, offering services based on models like Llama 3.
Local Deployment: vLLM has become the mainstream choice for production-level GPU services, while Ollama and LM Studio are popular among individual enthusiasts.

Storage: The Foundation of Memory

Core: Persistent state, such as conversation history, memory, and external data.
Key Technologies:

Vector Databases: Chroma, Weaviate, Pinecone, Qdrant, and Milvus are used to store the agent’s “external memory” to handle large capacity data.
Traditional Databases:Postgres also starts to support vector searches through the pgvector extension.

Why is it Important? Agents are stateful and need to store and retrieve information over the long term.

Tools and Libraries (Memory-Tool Libraries-Sandbox): Capability Expansion

Core: Tools (or “functions”) that enable agents to perform various tasks.
Invocation Method: Specifying the functions and parameters to be invoked via structured outputs generated by the LLM (e.g., JSON objects).
Safe Execution: Ensuring the safety of tool execution using sandboxes (e.g., Modal and E2B).
Tool Ecosystem:

General Tool Libraries: Such as Composio.
Specialized Tools: Browserbase (web browsing), Exa (web search), etc.

Why is it Important? Tools expand the capability boundaries of agents, enabling them to complete more complex tasks.

Agent Frameworks: The Command Center for Orchestrating Intelligence

Core: Responsible for orchestrating LLM calls and managing agent states.
Key Features:

State Management: How to save and load agent states, such as conversation history and memory.
Context Window: How to compile state information into the LLM’s context window.
Cross-Agent Communication: How to achieve collaboration between multiple agents.
Memory Management: How to cope with the limited context window of LLMs and manage long-term memory.
Open-source Model Support: How to enable agents to better utilize open-source models.

Popular Frameworks: Llama Index, CrewAI, AutoGen, Letta, LangGraph, etc.
Why is it Important? Frameworks determine how agents operate and their efficiency.

Agent Hosting and Serving: Future Trends

Core: Deploying agents as services accessible via APIs.
Current Pain Points: State management, safe tool execution, and scaling deployment are challenges.
Future Outlook: Standardized Agents APIs will emerge, making agent deployment more convenient.
Why is it Important? This will transition agents from prototypes to true applications.

Myths of AI Agent Frameworks: Returning to the Essence of LLM APIs

When building LLM applications, the principle of “Simplicity First” should be followed: keep it simple whenever possible, only adding complexity when necessary.

Only when simple solutions cannot meet the needs should more complex agent systems be considered. Because agent systems often come at the cost of higher latency and expenses in exchange for better task performance.

So when should they be used?

When to use “Workflows”? When tasks are very clear and can be broken down into a series of fixed steps, like work on an assembly line, using “workflows” is sufficient.
When to use “Agents”? When tasks require significant flexibility and the model needs to make decisions on its own, like a commander needing to adapt, then “agents” are more suitable.

But actually, for most applications, optimizing the single large language model’s invocation and combining it with some retrieval and contextual examples is usually sufficient.

Just like humans typically solve problems, start with the simplest method; it is not always necessary to use complex tools.

Developers are advised to start by directly invoking the large language model’s API:

“Many patterns can be implemented with just a few lines of code. If you do need to use a framework, ensure you understand the underlying code. Misassumptions about underlying principles are common sources of customer errors.”

Principles for AI Agents Practice (Simplicity is Key, Less is More)

“Simplicity” is the core concept summarizing the development path of building effective agents from simplicity to complexity:

Enhanced LLM: The foundational building block, equipped with retrieval, tool usage, memory, etc.;
Workflow Models:

Prompt Chaining
Routing
Parallelization
Orchestrator-Workers
Evaluator-Optimizer;

Autonomous Agent Systems capable of independently planning and executing complex tasks.

The foundation of AI agent systems is enhanced LLMs, which are equipped with retrieval, tools, and memory capabilities. Starting from the most basic building block of enhanced LLMs, gradually increase complexity, transitioning from simple combinatorial workflows to autonomous agent groups.

Enhanced LLMs (the cornerstone of applications)

The simplest method is undoubtedly enhanced language models.

By extending the capabilities of large language models through retrieval, tools, and memory mechanisms, large language models can proactively utilize these capabilities—generating their own search queries, selecting appropriate tools, and deciding what information to retain.

Two key points that large language models can enhance:

Customizing functionalities based on specific use cases
Ensuring that large language models provide a simple, well-documented interface.

Intelligent Workflows (Smart Advancement of Applications)

If tasks can be easily and clearly broken down into a series of fixed subtasks, workflows will be very applicable! Although breaking complex tasks down into simpler subtasks or steps makes it easier for large language models to handle calls, it sacrifices some timeliness for higher accuracy.

Prompt Chaining

This method breaks a task into a series of steps, where each large language model call processes the output of the previous call, allowing for programmatic checks (like “gates” in the diagram below) to ensure the process stays on track.

Applicable scenarios for prompt chaining workflows:

Sequential Processing: Scenarios where tasks are broken down into multiple sequentially executed subtasks

Routing

This method is suitable for classifying inputs and guiding them to corresponding specialized subsequent tasks. When the tasks you need to handle are complex and can be divided into several different categories, each category is best handled with different methods; routing workflows can be used.

Applicable scenarios for routing workflows:

Directing different types of customer service inquiries (general questions, refund requests, technical support) to different downstream processes, prompts, and tools.
Routing simple/common questions to smaller models (e.g., qwen2-1.5B), routing difficult/uncommon questions to more powerful models (e.g., qwen2-7B), to optimize cost and speed.

Parallelization

This mode considers that large language models can execute multiple tasks simultaneously and programmatically integrate results. Therefore, this parallelization workflow can enable large language models to collaborate in the following two ways:

Segmented Processing: Breaking down tasks into multiple independent subtasks and running these subtasks simultaneously.
Multiple Voting: Executing the same task multiple times to obtain diversified output results.

Parallelization workflows are very effective when subtasks can be executed in parallel to enhance speed or need to be approached from multiple angles or attempts to improve the reliability of results. Because for complex tasks involving multiple considerations, this mode usually achieves better outcomes!

Applicable scenarios for parallelization:

Segmented Processing:

Building safety mechanisms, where one model instance handles user queries while another filters inappropriate content or requests. This method typically performs better than having the same model handle both safety and core responses simultaneously.
Automating the evaluation of model performance, where each model call is responsible for assessing different performance metrics of the model under a given prompt.

Multiple Voting:

Conducting vulnerability assessments of code, where multiple different prompts review the code and flag issues when found.
Assessing whether content is inappropriate, where multiple prompts evaluate from different angles or use different voting thresholds to balance false positives and negatives.

Collaborator-Worker Mode

This mode involves a central large language model dynamically breaking down tasks and assigning these subtasks to different worker models, ultimately integrating all worker results. This workflow is particularly suitable for complex scenarios where subtasks cannot be predetermined (for example, in programming, the number of files to be modified and the content of each file modification often depends on the specific task).

Although structurally similar to parallelization, the key difference lies in its high flexibility—subtasks are not preset and are dynamically determined by the coordinator based on specific input.

Useful scenarios for the coordinator-worker mode:

Complex modifications needed for multiple files in coding products.
Searching tasks that require collecting and analyzing information from multiple sources to find relevant content.

Evaluator-Optimizer Mode

This mode involves one large language model responsible for generating responses while another large language model call provides evaluation and feedback in a loop. This workflow is particularly effective when there are clear evaluation criteria and iterative improvements can provide measurable value.

Indicators suitable for this workflow mode:

When human feedback can significantly improve the responses of large language models—
Large language models can provide such continuous feedback

Examples of Evaluator-Optimizer:

Literary translation, where the translation model may initially fail to capture subtle nuances, but the evaluator model can provide useful reviews.
Complex search tasks that require multiple rounds of searching and analysis to gather comprehensive information, where evaluators can decide whether further searches are needed.

Autonomous Decision-Making (Intelligent Agents in Applications)

As LLMs mature in key capabilities such as understanding complex inputs, reasoning and planning, reliably using tools, and recovering from mistakes, agents begin to emerge in production. Agents can handle open-ended problems without predefined steps and can make decisions autonomously based on environmental feedback.

During task execution, agents often need to obtain actual execution results from the environment to judge the next actions and check task progress.

Therefore, systems are usually designed with a pause function at specific nodes or when difficulties arise to introduce human intervention or feedback. Tasks typically end automatically upon completion, but to prevent infinite loops or resource exhaustion, stop conditions are often set, such as limiting the maximum number of iterations.

When faced with complex problems where steps are not fixed and execution paths cannot be pre-planned, agent groups can leverage their advantages. With their ability to act autonomously, agent groups are particularly suited for efficiently handling large-scale tasks in controlled environments.

Since agent groups typically accomplish tasks through iterative reasoning and execution, users need to build trust in their decision-making abilities.

Conclusion

The future of AI agent development lies in “Less is More”, with three core principles for building effective agents:Simplicity, Transparency, and Thoughtful Design

Maintain Design Simplicity: Avoid over-complication.
Prioritize Transparency: Clearly show the planning steps of the agent.
Thoughtfully Design Agent-Computer Interfaces (ACI): Through comprehensive tool documentation and testing.

In the field of LLMs, success does not depend on building the most complex system, but rather on building the system that best meets the needs. We should start with simple prompts, optimize through comprehensive assessments, and only consider introducing multi-step agent systems when simple solutions are insufficient to meet needs.

When deploying agent systems in production environments, remember to reduce abstraction layers without hesitation and build using basic components, so that the constructed agent systems are the most efficient, concise, and powerful!

Finally, in today’s rapidly developing AI technology landscape, these principles will help us build truly reliable, maintainable, and user-trusted agent systems. While pursuing technological innovation, remember one thing:The best solutions are often the simplest solutions.