Building Agentic AI: A Comprehensive Guide from Basics to Advanced Applications

If you are in the AI field, you may have recently heard the term AI Agent frequently. In this article, we will delve into what it means when we refer to Agents in the context of large language models (LLMs) and artificial intelligence (AI).

Before we dive deeper, one thing to remember is that the term Agent existed long before the advent of today’s high-performance LLMs. We could even say that AI Agents have existed for a long time, just not centered around today’s generative LLMs. However, what has changed is that they have become more powerful and complex. In short, the reason you hear more discussions about Agents now is not that they are a brand new technology, but because they have become very, very interesting.

What is an AI Agent?

At a fundamental level, today’s AI Agents are semi-automated or fully automated systems that use LLMs as their “brains” to make critical decisions and solve complex tasks. You can think of them as automated decision engines, where you, as the user, only need to pose your query. They operate within an available tool environment, using various tools to complete tasks for you, allowing you to sit back and relax while they handle the problems.

Agents autonomously direct their own processes and executions, choosing which tools to use based on the current task. These tools can include web search engines, databases, APIs, etc., enabling Agents to interact with the real world.

History of AI Agents

AI Agents have actually existed for a long time. You can even see in a recent article published by Microsoft about AI Agents [1], the authors mentioned they started researching AI Agents as early as 2005. However, in recent years, particularly thanks to the capabilities of the latest LLMs, our forms and functions of AI Agents have significantly changed. Now, we can use LLMs as core components for planning, reasoning, and execution.

With that said, I want to highlight a few milestones of AI Agents in recent years, which you can consider as the starting point for discussing today’s AI Agents (2025). Of course, this is my personal experience reflecting on the past few years. But let’s rewind the clock back to before the release of ChatGPT. In 2020, two papers can be seen as the beginning of modern AI Agents that use LLMs as core decision-making components:

  • MRKL Systems: pronounced as “miracle systems” (“奇迹系统”), this paper primarily focuses on the limitations of language models, exploring why we get so many false responses. In simple terms, the paper points out a fact we now fully understand: language models do not comprehend everything; they are designed to generate language. We can understand it this way: we cannot expect someone to know our birthday unless we tell them. This paper proposed a method to provide language models with external knowledge bases that allow them to query and extract relevant information. [2]

  • ReAct: This paper was published slightly later than the MRKL systems, introducing a component critical to today’s Agents. ReAct stands for “Reason and Act,” proposing a clever prompt structure that enables LLMs to consider problems, reason solutions, choose appropriate tools, and implement them. In simple terms, for example: not just posing a question but also telling the model what resources it can use and asking it to devise a plan to resolve the query. In short, this paper introduced a new way of thinking that makes the reasoning and action process of LLMs more reliable. [3]

Building Agentic AI: A Comprehensive Guide from Basics to Advanced Applications

Note: The actual ReAct prompts recommended in the paper are more complex, containing instructions on how to generate thoughts, reason, etc.

In my view, these two papers highlighted two very important findings and features that contributed to today’s AI Agents: good instructions and external tools. Coupled with the thousands of people starting to experiment with these LLMs, we have now entered a world where we begin to build increasingly complex AI Agents (these Agents do not only use the ReAct prompting method).

Next, let’s look at the core components that make up today’s AI Agents.

Core Components of AI Agents

While not every AI Agent must include all of these components, when we build Agents, we will at least include the following components and processes: LLMs, access tools (via function calls), some degree of memory, and reasoning.

Let’s delve into their respective roles:

  • LLM: You can think of the LLM as the “brain” of the entire operation. While it may not be responsible for every step, when we refer to the Agents of 2025, generative models play an important coordinating role in operations. In simple terms, going back to the example mentioned above: it is the LLM that decides to first check the user’s calendar and then look at the weather.

  • Tools: One important feature of Agents is that they interact with the environment through different tools. These tools can be seen as “add-ons” that make Agents more efficient. These tools allow Agents to transcend the fixed training knowledge of LLMs by providing highly relevant real-time data (e.g., personal databases) and capabilities (e.g., sending emails), broadening their application scope. Through function calls, LLMs can directly interact with a predefined set of tools, thereby expanding the operational range and efficiency of the Agents.

  • Memory: Agents typically possess some form of memory (including short-term and long-term memory), allowing them to store logs of reasoning processes, conversation history, or information gathered across different execution steps. We need memory to support ongoing conversations with the Agents and those dialogues we wish to return to later. Memory can be used to personalize experiences or plan future decisions.

  • Observation and Reasoning: LLMs are core components of problem-solving, task decomposition, planning, and path selection. They allow Agents to reason through a problem, breaking it down into smaller steps (if needed), and decide how and when to use available resources/tools to provide the best solution. However, not every Agent is the same; sometimes we explicitly incorporate reasoning as part of the process when building Agents.

One important point is that AI Agents can have various design patterns, and these components can be used to different degrees. The Agents we see today exist on a continuum, and their autonomy or “Agent behavior” largely depends on how much decision-making power is delegated to the LLM. In simple terms: some Agents are designed to be more independent, while others rely on more external input and control.

Building Agentic AI: A Comprehensive Guide from Basics to Advanced Applications

How Do AI Agents Work?

Most of the AI Agents we see today use LLMs as core decision-makers/operational coordinators. The degree of autonomy of LLMs certainly varies, and we will discuss this further in the “Future Outlook” section of this article. But first, let’s start with the basics and discuss how an AI Agent that primarily relies on LLMs for most of its decisions works.

I’ve noticed that recently when people discuss LLMs and Agents, there seems to be a lot of “magic” happening behind the scenes. So, here, I will do my best to explain what actually happens behind an AI Agent that has access to certain tools.

Defining Prompts

At the core of any system using LLMs is an instruction (prompt) that sets the core purpose for the LLM. The ReAct paper also clearly illustrates how to define an Agent that reasons, generates thoughts, and observes through highlighting a complex prompt. For example, we might give the LLM an instruction like: “You are a helpful assistant who can access my database to answer my queries.”

Providing Tools

Next, we need to provide a list of tools for the LLM. This is one of the most common ways to create AI Agents today, although it is not always necessary; we can still create systems with Agent functionalities without tools and function calls. Most model providers today support “function calling,” which allows us to set up interactions with the LLM, listing the tools it may use at any time to resolve queries.

When we provide tools to the LLM, we need to tell the LLM some information. The LLM uses this information to decide when to use the tools:

  • Name: For instance, a tool might be called technical_documentation_search.

  • Description: This is the most important information the model uses during reasoning to determine which tool to use. For example, for the technical_documentation_search tool, we might provide a description: “This is very useful when you need to search internal technical documentation for answers.”

  • Expected Input: Remember, the tools are external to the LLM. The LLM knows their names and descriptions, but ultimately, the task of generative language models is to generate language. So what can it do? It can do what it excels at! It may generate content, return a function (tool) name, and the expected input needed to run it. Therefore, when we provide a list of tools, we also need to provide this information. For example, for our technical_documentation_search tool, we can tell the LLM that it needs query: str as input.

If you’re interested in how this works in practice, you can check OpenAI’s function definition documentation:

https://platform.openai.com/docs/guides/function-calling

Using Tools

So, we have the LLM, the LLM knows it can access some tools, knows how to run them, and what they are for. However, the LLM does not inherently have the capability to execute things like running Python scripts… or searching your documents. However, it can provide a message explaining which tool it intends to run and tell us what inputs it wishes to use to run that tool.

Let’s illustrate with the following scenario:

  • We have an AI Agent using LLM.

  • We provided the technical_documentation_search tool, with expected input as query: str. And the description is “This is very useful when you need to search internal technical documentation for answers.”

  • The user asks: “Hey, how do I use Ollama with DeepSeek R1?”

In this scenario, what actually happens is:

The LLM generates a reply simplified to: “Run the tool technical_documentation_search, query = ‘Using Ollama and DeepSeek R1.'”

In reality, the LLM takes our AI Agent application out of its “world.” It instructs the system to reference an external resource.

Observing Tool Responses

If everything goes well, by this point, your AI Agent has run a tool. Remember, this tool can be anything. For example, our technical_documentation_search tool itself might be a RAG (Retrieval Augmented Generation) application that uses another LLM to generate responses to queries. The key point is that ultimately we may have run the tool by querying “Using Ollama and DeepSeek R1,” and the response is “You can pull the DeepSeek R1 model by enabling ollama pull deepseek-r1 and run the DeepSeek R1 model with ollama run deepseek-r1,” or something similar. But this is not the end, as the original LLM that constitutes our AI Agent core has not yet received a response.

When the tool runs, the results of the tool return to the LLM of the Agent. Usually, this is provided as a chat message, with the role set to “function call.” So the LLM knows that the response it sees is not from the user but from the tool it decided to run. Then, the LLM observes the results from the tool (or multiple tools) and provides the final answer to the user.

Congratulations!

At this point, you have understood the basics that constitute AI Agents, especially those that rely on tools and function calls. I like to compare it this way: the LLM acts as the core coordinator of the AI Agent, like a wizard holding a spellbook but without a wand. The LLM knows what it can do and how to do it, but what it cannot do is merely speak the spells. The tools still need to run outside of the LLM.

Building Agentic AI: A Comprehensive Guide from Basics to Advanced Applications

What is Agentic AI?

First, Agentic is an adjective.

There are many new terms to adapt to, which can be confusing. But in reality, when we discuss Agentic AI and AI Agents, we can make it easier to understand. AI Agents themselves are a form of Agentic AI, but AI Agents typically refer to an end application designed for a specific task. For example, an AI Agent might be a document search assistant or a personal assistant capable of accessing your email and WeChat.

However, when we say Agentic AI, we usually refer to a system that, by design, incorporates components such as decision-making LLMs, reasoning steps, possibly some tools, self-reflection, and other Agentic components. To be considered Agentic, it does not need to possess all these components but typically exhibits characteristics of some of them.

Tools for Building AI Agents

Building AI Agents requires integrating multiple components and tools, especially to create a system capable of autonomous or semi-autonomous decision-making, interaction, and task execution. Although advanced Agents may be very complex, even the simplest Agents need some basic elements. Here are some resources that can help you get started building AI Agents:

1. Language Model Providers

The foundation of AI Agents is LLMs, which provide the entire reasoning capability for the Agents. LLMs enable Agents to understand different inputs and plan their actions effectively. It is also crucial to choose an LLM that supports built-in function calling, allowing us to connect it to external tools and APIs. Common LLM providers include:

  • OpenAI: GPT-4, o3-mini

  • Ali: Qwen-2.5, DeepSeek-R1, llama-3, etc.

  • Google: Gemini 2.0 Pro, Gemini 2.0 Flash

  • Mistral: Mistral Large, Mistral Small 3

  • Using open-source models from Hugging Face or Ollama

2. Memory and Storage

Agents need some form of persistent memory to retain context. Memory can be categorized into two types:

  • Short-term Memory: Used to track ongoing conversations or tasks.

  • Long-term Memory: Used to remember past conversations, personalized information, and experiences.

Currently, both short-term and long-term memory for Agents have many different implementations, and as technology advances, we may see more variations. For example, for short-term memory, we can help the LLM manage context length limitations by providing a “conversation summary.” For long-term memory, we might choose to use databases to back up conversations. This may change the role of vector databases like Weaviate, AI Agents can extract the most relevant previous conversation content used as long-term memory.

3. AI Agent Orchestration Frameworks

Orchestration frameworks act like intelligent commanders, coordinating all components of AI Agents and managing multiple Agents in a Multi-Agent setting. They abstract away much of the complexity, handle error/retry loops, and ensure that language models, external tools/APIs, and memory systems work together smoothly.

Currently, several frameworks simplify the development of AI Agents:

  • Langgraph: Provides a structured framework for defining, coordinating, and executing multiple Agents.

  • LlamaIndex: Enables the creation of complex Agent systems with varying degrees of complexity.

  • CrewAI: A multi-Agent framework for orchestrating autonomous AI Agents with specific roles, tools, and goals.

  • Hugging Face smolagents: A library that allows you to run powerful Agents in just a few lines of code.

  • Haystack: An end-to-end framework that allows you to build AI applications like Agents that support LLMs.

  • OpenAI Swarm: An educational framework exploring ergonomic lightweight multi-Agent orchestration.

4. Tools and APIs

The capabilities of Agents depend on the tools they can access. By connecting various APIs and tools, Agents can interact with the environment, performing tasks such as web browsing, data retrieval, database querying, data extraction and analysis, code execution, and more.

Frameworks like LlamaIndex provide ready-made tool integrations, such as PDF, website, and database data loaders, as well as application integrations like Slack and Google Drive. Similarly, Langchain offers a wide range of tools for Agents to use. Additionally, developers can build custom tools as needed, introducing new functionalities by wrapping APIs. Recent research, such as “Querying Databases via Function Calls” [4], even hints at the potential of function calls for database queries.

Overall, building AI Agents is like assembling a puzzle. You start with a good language model, add the right tools and APIs, then incorporate memory features to let the Agent remember important things. You can use orchestration frameworks to simplify the process and bring all the parts together, ensuring that each part fits perfectly.

The Future of AI Agents: Challenges and Opportunities

One great aspect of AI Agents and Agentic AI is that they are still evolving. Although we haven’t discussed all the challenges here, as well as other core components involved in building AI Agents in actual production environments, such as observability, there are a few things worth emphasizing, especially regarding the future of AI Agents.

For instance, you may have noticed that unless we intentionally design our Agentic applications, we might find a lot (perhaps too much) reliance on LLMs to make the right decisions. If the Agent can access search tools or knowledge bases, that may not be a problem. But what if that tool could access your bank account, and the Agent could now buy you a very expensive one-way ticket to Hawaii?

One debate I’ve been listening to recently is whether the use of AI Agents is more as “research assistants” or as “executors of our intentions.” This is a simple but important debate that may change as LLMs continue to improve and we have better regulations and constraints in the AI field.

Levels of Autonomy and Human Intervention

Now you understand how a basic AI Agent operates. However, it is not necessarily (or advisable) to let the LLM be the coordinator of all operations. We have begun to see more and more Agents delegate processes to simpler, more deterministic systems. In some cases, these processes are even delegated to humans. For example, we may see more scenarios requiring human approval before an operation occurs.

We even see tools like Gorilla implementing Agents with an “undo” feature, allowing humans to decide whether to roll back an operation, thus adding human intervention (Human in the Loop) to the entire process.[5]

Multimodal AI Agents

Multimodal refers to the ability to use more than one modality, which means going beyond language (text) to include images, videos, audio, etc. The technology in this area is basically already in place. Therefore, we may see more and more AI Agents capable of interacting with various media, either as part of their tools or, if they use multimodal LLMs, possessing these capabilities inherently. Imagine an AI Agent that you can say, “Create a cute puppy video and send it to my email!”

The Role of Vector Databases

Another interesting topic is how far the role of vector databases in AI may expand. Currently, we mainly see vector databases as knowledge sources accessible to AI Agents. However, it is easy to imagine a future where we use vector databases and other types of databases as memory resources for Agent interactions.

Examples and Use Cases of AI Agents

AI Agents are reshaping the way we work, and this change is already visible across multiple industries. AI Agents shine the most when we need a perfect combination of conversation and action. By automating repetitive tasks, they not only improve work efficiency but also enhance the overall user experience. Here are some real-world examples of AI Agents:

1. AI Research Assistant

AI research assistants can simplify the process of analyzing large amounts of data, identifying trends, and generating hypotheses. Today, we have seen people in academia or the workplace using ChatGPT as an assistant to help them gather information, build ideas, and provide the first steps in many tasks. One could say that ChatGPT itself is a research assistant Agent. These types of Agents are sometimes referred to as Agentic RAG, meaning AI Agents can access multiple RAG tools, each accessing different knowledge bases.

2. AI Customer Service

AI customer service Agents provide 24/7 support, handling queries, troubleshooting, and providing personalized interactions. They reduce wait times, allowing humans to handle more complex tasks. They can serve as research assistants for customers, quickly providing answers, and also completing tasks for customers.

3. Marketing and Sales Agents

These Agents optimize marketing campaigns and sales processes by analyzing customer data, personalizing promotions, and automating repetitive tasks (such as lead qualification and email follow-ups).

4. Code Assistant Agents

These Agents assist developers by suggesting code, debugging errors, resolving tickets/issues, or even building new features. This allows developers to save time and focus on creative problem-solving. Tools like Cursor and Copilot are examples of this.

Conclusion

This article outlines what AI Agents mean in 2025 and briefly introduces how they work. We reviewed the components that help understand AI Agents, such as prompts, tools, observing tool responses, and reasoning toward final answers. Finally, we looked forward to the future of AI Agents, discussing current shortcomings and the progress we can expect.

https://weaviate.io/blog/ai-agents

[1] https://news.microsoft.com/source/features/ai/ai-agents-what-they-are-and-how-theyll-change-the-way-we-work/

[2] https://arxiv.org/abs/2205.00445

[3] https://arxiv.org/pdf/2210.03629

[4] https://arxiv.org/abs/2502.00032

[5] https://github.com/ShishirPatil/gorilla

Leave a Comment