

This article explores the fascinating and complex frontier of Agentic AI in detail. It covers a range of concepts from beginners to experts, and provides practical examples and applications to explain these concepts. The article also discusses the architecture, components, and evaluation methods of Agentic AI, analyzing its advantages, disadvantages, and limitations. Best practices and recommendations are also provided for some key challenges in the development process.
Key Points
Agentic AI refers to artificial intelligence systems with autonomous behavior.
Agentic AI systems can understand their environment, reason, plan actions, and learn from experience.
Agentic AI can be divided into two modes: fast intuitive “System 1″ and deliberative “System 2″.
The architecture of Agentic AI includes components such as trigger mechanisms, orchestration agents, memory functions, and guardrails.
Agentic AI tools and frameworks each have their pros and cons, which need to be integrated to streamline the development process.
To ensure the reliability of Agentic AI systems, unit evaluations, integration evaluations, runtime validations, and red team testing are necessary.
Agentic AI systems face challenges such as accuracy, cost, and latency, which require corresponding best practices.
Ensuring that Agentic AI behavior aligns with human values is also a major challenge, requiring measures such as fine-tuning and reinforcement learning.
This content was generated by AI summarization

Agentic Framework Architecture
Introduction
Agentic AI represents a fascinating and complex frontier in the development of artificial intelligence. Unlike traditional machine learning models that passively receive input and generate output, Agentic AI involves entities known as agents that can make decisions, interact with their environment, and achieve specific goals. These agents autonomously execute tasks using components such as reasoning, memory, and planning, and can even collaborate with other agents to solve larger problems. This article explores Agentic AI in detail, covering concepts from beginner to expert levels, while providing practical examples and applications to illustrate these concepts effectively.
What is Agentic AI
Agentic AI refers to artificial intelligence systems that exhibit autonomous behavior, actively striving to achieve goals with minimal direct human intervention. Agentic AI systems can understand their environment, reason available options, plan their actions, execute tasks, and learn from their experiences. The term “agentic” indicates that these systems can take proactive actions, make decisions, and even communicate with other agents to complete complex tasks.
Examples
- Customer Support Agents: Customer support agents use Agentic AI to interact with customers, answer queries, and escalate issues when necessary. These agents can provide round-the-clock support, adapt to customer needs, and learn from past interactions to improve their responses.
- Marketing Agents: Marketing agents are designed to autonomously manage marketing activities, including audience segmentation, scheduling posts, and optimizing advertisements based on real-time performance data. They help businesses achieve better promotions with minimal human intervention.
- Coding Agents: Coding agents are intended to autonomously generate code, scripts, or perform debugging tasks. These agents can be used to automate repetitive coding tasks, generate boilerplate code, or create custom functions as needed. They can run in an Integrated Development Environment (IDE) to execute specific commands, thus optimizing the software development lifecycle.
- Virtual Assistants: Personal assistants like Alexa or Google Assistant use Agentic AI to respond to user requests, set reminders, make suggestions, and control smart devices to adapt to user preferences.
- Financial Trading Bots: Automated trading agents can make trading decisions in real-time based on market conditions, predefined rules, and evolving strategies.
Agents and Systems 1 and Systems 2 Thinking
In Daniel · Kahneman’s book Thinking, Fast and Slow, human cognition is described as two systems: System 1 and System 2. System 1 thinks quickly, intuitively, and automatically, while System 2 thinks slowly, more deliberatively, and logically. This concept can be applied to understand how Agentic AI operates.
- System 1 Agents: These agents are designed to make quick decisions and respond to simple everyday tasks. They operate similarly to System 1 thinking— fast and efficient. For example, customer support agents that provide instant responses based on predefined rules or patterns resemble System 1 thinking, where speed and immediacy are prioritized.
- System 2 Agents: These agents handle more complex tasks that require careful reasoning, planning, and evaluation. They embody System 2 thinking, which involves a more in-depth and thoughtful decision-making approach. Agents like planning agents are designed to demonstrate System 2 behavior, requiring multi-layered analysis, planning, and evaluation before making decisions.
Agentic AI typically integrates both operational modes: rapid instinctive responses for clear tasks (System 1) and more comprehensive, deliberate actions for complex decisions (System 2). Understanding this distinction helps in designing agents that can effectively leverage both approaches based on the nature of the task at hand.
Architecture and Components
Agentic Framework Architecture
Agent Components Architecture
The above image is a diagram describing the components of the agent framework.
Trigger Mechanisms
Any agent system can be initiated by users interacting with it or by another system triggering actions via webhook or frequency timers.
LLM
Each agent framework relies on a language model (LLM), with each component having access to the same or different LLM to achieve its objectives, providing flexibility and scalability for handling various tasks.
Orchestrating Agents
Orchestrating agents act as orchestration components, including reasoning, planning, and task decomposition. This agent understands all other agents and determines which agents to execute and in what order through appropriate planning, reasoning, and task decomposition. An LLM with reasoning capabilities (for example, OpenAI’s o1-preview) is best suited for such agents but should be used cautiously to balance other considerations of the agent system.
Please open in WeChat client
Agents
Agents encapsulate a set of instructions and tools to accomplish a given task:
- Prompts: Commands given to the language model along with the tools it can access.
- Tools: Execution blocks that perform operations, such as simple code blocks, API calls, or integrations with other systems.
- Environments: Tools may also have associated environments to perform specific tasks, such as IDE or general computer usage.
- Complex Agents: Agents can also be entire architectures, such as retrieval-augmented generation (RAG), which includes embeddings and vector databases.
Memory
Memory functions in Agentic AI allow agents to store information and recall it in future interactions. Memory functions are available to all components at any time and can include different types:
- User Profiles: User-specific information that helps agents create personalized experiences.
- Chat History: Conversation history that allows agents to extract contextual information from past interactions.
- Chat State: Tracking completed workflows to avoid repeating tasks.
Guardrails
Guardrails are safety mechanisms that prevent harmful actions while ensuring robustness when handling unforeseen inputs or situations. Rules such as “ensure that competitors are not mentioned in responses,” “avoid discussions about religion or politics,” are examples of rules that should exist at the framework level. These constraints are crucial for deploying agents in dynamic environments, providing default safety checks that can be edited as needed.
Agent Observability
Observability enables developers and users to understand what agents are doing and why. Providing transparency into agent behavior helps diagnose issues, optimize performance, and ensure that agents’ decisions align with expected outcomes.
Adaptation and Learning
Adaptation refers to the ability of agents to adjust their behavior based on feedback from the environment. This includes reinforcement learning or other adaptive techniques that allow agents to optimize their decisions over time. For example, marketing agents can adjust their strategies based on changing customer preferences.
AI Agent Ecosystem
The infrastructure for Agentic AI has seen significant growth over the past year and is expected to continue to evolve rapidly. This development has brought many new tools and components that help build Agentic AI systems.
Agentic AI Infrastructure
The chart provided by Madrona illustrates the various parts of the ecosystem, including agent hosting, orchestration, memory, platform tools, and more.
Advantages
- Diverse Tools: Each component of building Agentic AI has many excellent tools available. This diversity allows developers to choose dedicated tools based on their specific needs.
- Simplified Design: Many of these tools are designed with user-friendly interfaces or simple APIs, making it easy to get started and quickly prototype across various domains.
Disadvantages
- Fragmentation of the Ecosystem: The abundance of specialized tools leads to a highly fragmented ecosystem, requiring substantial effort to integrate them. Developers often spend more time integrating and managing multiple tools rather than focusing on core development.
- Integration Complexity: Ensuring compatibility and effective communication between different tools can be very complex and time-consuming, adding significant overhead to projects.
While the diversity of tools allows for high customization and specialization, it also poses significant challenges for developers who must figure out how to seamlessly integrate them into their frameworks.
This fragmentation often makes development cumbersome and time-consuming, especially when developers must manage multiple tools that may or may not work together. As the ecosystem continues to evolve, integration into a unified framework or more comprehensive tools will be key to simplifying the development process. This integration will make it easier for developers to implement agent-based AI systems without constantly dealing with integration issues and compatibility challenges.
Ultimately, while the rapid development of Agentic AI infrastructure is encouraging, the need for cohesive and integrated tools is crucial to ensuring that development remains accessible, efficient, and scalable across a wide range of use cases.
For example: Hedge Fund Agents
To evaluate the above framework, we take hedge fund agents as an example, which summarize and make decisions on publicly traded company stocks through various analyses.
Hedge Fund Agent Architecture
Analysis Agent Architecture
The above is the architecture of the hedge fund analyst team**
- Portfolio Manager: Acts as the planning agent, coordinating the entire decision-making process by deciding which agents to call and in what order.
- Fundamental Analyst: Conducts fundamental analysis of stocks, assessing financial statements, earnings, profitability, and other metrics.
- Technical Analyst: Performs technical analysis using historical market data, price trends, and chart patterns.
- Sentiment Analyst: Focuses on news articles, social media, and other sources of public sentiment to gauge market sentiment.
- Summary Analyst Analyst: Decides whether to buy, hold, or sell the stock based on the opinions of other analysts.
- User Inquiry Analyst: Contacts users when more information is needed for further analysis.
Frameworks
There are various frameworks available to easily build agents. Below is a quick assessment of all frameworks, most of which implement them:
Orchestrating Agents |
Agents |
Tools |
Sequential Workflows |
Dynamic Workflows |
Memory |
Guardrails |
|
LangChain |
✅ Formal Agents |
✅ |
✅ Out of the box |
✅ |
❌ |
✅ |
❌ |
Language Model |
✅ Has formal agents |
✅ |
✅ Same as langchain |
✅ |
✅ Can build custom graphs |
✅ State |
❌ |
Crew AI |
✅ Orchestrating and Manager Agents |
✅ |
✅ Langchain and Llamaindex tools |
✅ |
✅ Hierarchical |
✅ Long-term, short-term, entity, context |
❌ |
Swarm |
✅ Has formal agents |
✅ |
❌ Not out of the box |
✅ |
❌ Uses handoff for building |
✅ State and context variables |
❌ |
Camel Index |
Pending |
Pending |
Pending |
Pending |
Pending |
Pending |
Pending |
MSFT AutoGen |
Pending |
Pending |
Pending |
Pending |
Pending |
Pending |
Pending |
Framework Descriptions
Most frameworks have most features; differences lie in implementation details rather than functionality.
- Langchain: Easy to get started, but difficult to build complex flows. Best used with LangGraph.
- LangGraph: Like Langchain, Langgraph is also a very powerful tool. It takes some time to get used to, and the visualization charts are difficult to understand without deep exploration.
- CrewAI: Overall, this is a great framework that can achieve all functions. There is also a SaaS platform that makes it easier to use. It allows for hierarchical processes, but more complex processes are still in planning.
- Swarm: Primarily an educational framework, not yet ready for production. It has most features, but combining them to build complex processes seems limited.
- Microsoft AutoGen: Pending evaluation.
Observations
- No one has a way to build guardrails at the framework level.NVIDIA has a framework called NeMo Guardrails that allows programmable guardrails via plugins, but they must be inserted separately.
- Most memory is accomplished through state and context variables. All of these require manual management.
- Planning can be done through standard agents, but state handling needs to be done manually.
- Dynamic workflows are accomplished using handoff, but langgraph and crewAI make building graphs easier.
Evaluation
Evaluating Agentic AI is crucial for ensuring reliability, performance, and safety. Due to the need for rapid iteration and the subjectivity of evaluating agents, attention to testing and evaluation is often insufficient. Below, we will discuss insights and best practices for effectively testing Agentic AI systems.
Unit Evaluation
Ideally, each agent should have its own evaluation, similar to unit testing in software engineering. Unit evaluations ensure that individual agents behave as expected and meet their functional requirements. Agents should be tested in various scenarios to validate their reasoning, planning, and execution accuracy. Keeping each agent’s output structured is beneficial, as structured output makes validation easier.
Integration Evaluation
The entire system should also undergo evaluations similar to integration testing. Integration evaluations check the status of agents working together as a complete system, verifying that interactions between agents yield correct results. This type of testing is crucial for identifying issues arising from communication failures or unexpected dependencies between agents.
Runtime Validation
Runtime validation involves real-time evaluation of agent outputs to enhance their adaptability and learning capabilities. There are two ways to achieve this: one involves human involvement (which can only be done on a small amount of data), and the other uses larger models. Using larger models as runtime validators can help ensure that agents make decisions aligned with their intended goals. However, this can be computationally expensive. To reduce costs, batch runtime validation or selective runtime validation (e.g., based on user feedback or key events) can be employed. These validation methods help maintain system quality while adapting to new data.
Red Team
Red team testing is a key component of evaluating agent frameworks, especially for identifying vulnerabilities and ensuring robustness. This type of evaluation involves simulating adversarial conditions to determine the system’s ability to handle unexpected or potentially harmful inputs.
Red team testing helps expose weaknesses in agents concerning reasoning, memory handling, and interaction strategies. This is crucial for understanding how agents fail under adversarial conditions and for establishing better security measures. Incorporating red team testing into the evaluation process ensures that agents can withstand attacks and operate safely in various environments.
Best Practices
- Structured Output: Ensure that each agent’s output is well-structured. Structured output makes it easier to validate correctness and identify issues promptly.
- Large-Scale Testing: Use larger language models to test final outcomes to ensure scalability. Larger models can simulate various user behaviors, aiding in effectively stress-testing the system.
- Iterative Evaluation: Agents should be evaluated iteratively so that developers can identify weaknesses early and improve quickly. Each iteration contributes to refining the agent and producing more stable results over time.
By integrating these testing practices, developers can enhance the reliability, adaptability, and robustness of Agentic AI systems.
Considerations
Despite their potential, Agentic AI systems face limitations that hinder their applicability.
Accuracy of Function Calls
Although LLM can determine which tools to call, the accuracy of these decisions remains low. According to the function calling leaderboard from UC Berkeley, the best-in-class LLM has an accuracy rate of 68.9%, but the level of hallucination is high. This limitation reduces the reliability of LLM in high-risk use cases.
Function Call Accuracy
Best Practices for Improving Accuracy
- Limit Tools: Limit the number of tools per prompt to 4-5 tools. Beyond this number, hallucination increases, and accuracy declines significantly.
- Specific Prompts with Examples: Make prompts very specific and try to provide examples of when to call which tool. This can help LLM better understand the context, reduce errors, and improve decision-making accuracy.
- Good Tool Calling Evaluation: Comprehensive evaluation of tool calls is crucial, as it helps measure and improve accuracy. This is achievable because outputs can be structured, making it easier to validate whether the correct tools were used effectively.
Cost
Running Agentic AI can be expensive, especially considering the number of calls to language models (LLM) and the involvement of multiple agents. Managing memory, reasoning, and tool usage adds to computational overhead, making operations costly. However, several strategies can help reduce these costs:
- Dynamic Pricing: While Agentic AI may currently be expensive, price trends are expected to decrease significantly over time (10 times per year). Advances in model optimization and increasing competition among vendors are driving down costs, making Agentic AI more feasible in the near future.
- LLM Selection: Each agent can use different types of LLM that are best suited for its purpose. For example, a planning agent may require a more powerful LLM capable of complex reasoning, while simpler agents can rely on cheaper lightweight LLM. This selective usage can help control costs while maintaining overall system efficiency.
- Hybrid Models: For tasks that do not require complex language understanding, using a combination of LLM and smaller dedicated models (e.g., rule-based systems) also helps reduce dependency on expensive LLM calls. Agents can offload simpler tasks to non-LLM models, reducing the frequency of costly operations.
- Optimize Token Usage: Wisely using tokens and prompts can significantly reduce costs. For example, sending summaries or only relevant parts of a codebase can help reduce the number of tokens, thus minimizing LLM usage fees.
- Batching and Sharing Resources: Agents can be designed to share resources as much as possible. For example, memory and intermediate results can be cached and reused by multiple agents, reducing redundant computations and LLM calls. This optimization helps lower computational and financial costs.
Please open in WeChat client
Latency
Agentic AI often involves multi-step reasoning and interactions, which can introduce significant latency, especially when agents communicate with external tools or make calls. Here are some best practices to reduce latency:
- Stream Last Layer: For user-facing agents, setting the last layer of LLM to stream can significantly reduce perceived latency. Streaming output allows the system to provide tokens as they are generated, creating faster response times for end-users and improving overall user experience.
- Prompt Optimization: Prompts and results are often the same for different users. To reduce latency, prompt caching can be used to reuse previously computed results whenever possible. By caching common prompts, agents can bypass unnecessary recomputation, providing faster responses.
- Dedicated LLM: Not every agent needs to use large, complex LLM. By leveraging simpler LLM for simple tasks and reserving resource-intensive models for complex reasoning, latency can be effectively minimized.
- Parallel Execution: Whenever possible, agents should run in parallel rather than sequentially. For example, sentiment analysis and technical analysis can be performed simultaneously, reducing the total processing time of tasks.
- Efficient Runtime Strategies: Using batch or selective runtime validation can also reduce latency. There is no need to validate every output; only key outputs or random samples need to be checked, saving processing time and resources.
Alignment
Ensuring that agents align with human values and goals is an ongoing challenge in the development process of Agentic AI systems. Misaligned agents may make decisions that, while logical from their perspective, could be harmful or unwelcome to users. Below, we explore best practices and methods to improve alignment:
- Fine-Tuning for Specific Task Goals: Fine-tuning language models for specific tasks helps improve the agents’ alignment with expected outcomes. By training on specific task data, agents can better understand and meet the expectations of particular applications, ensuring their behavior is more predictable and aligned with user goals.
- Reinforcement Learning from Human Feedback (RLHF): Reinforcement learning based on human feedback can be used to fine-tune agents according to user preferences and values. RLHF enables agents to iteratively improve their behavior by learning from user feedback, making them more adaptable to subtle requirements and avoiding undesirable behaviors.
- Consistency Checks: Regular consistency checks should be conducted to ensure that the decisions made by agents align with expected behavior. Consistency checks can include comparing agent outputs against a predefined set of acceptable responses, helping to identify inconsistencies early.
- Explainability for Better Coordination: Agents that can explain their reasoning are easier to align with human expectations. Explainable AI allows users to understand the reasons behind agents’ specific decisions, providing transparency and making it easier to detect when agents deviate from their intended purposes.
- Human-Machine Monitoring: For critical applications, introducing human-machine monitoring is essential. This allows for real-time intervention when agents make decisions that deviate from expected goals, ensuring safety and alignment with human values. Human-machine monitoring is particularly important in high-risk scenarios, where autonomous decisions can have significant consequences.
Conclusion
In this comprehensive guide, we explored the fundamental concepts and components of Agentic AI, including its architecture, various examples, and frameworks. Agentic AI represents a unique approach to building autonomous systems that can reason, plan, and adapt to achieve specific goals. We discussed how agents operate like System 1 and System 2 thinking, integrating rapid, intuitive actions with thoughtful and complex decision-making processes. Additionally, we analyzed different frameworks that support the implementation of agent systems and shared insights on best practices for testing and evaluation.
While there are successful case studies, such as Amdocs using Nvidia NIM and Wiley collaborating with Salesforce, allowing agents to be fully autonomous still presents challenges. Trust and scalable runtime verifiability remain significant obstacles, as does the question of how best to build agents for optimal outcomes. Developing robust frameworks can play a key role in overcoming these challenges, while ongoing improvements in testing (including runtime validation and red team testing) will help ensure that agents operate reliably and safely.
References and Links
- ** Excerpt from@virattt’s AI hedge fund team’s github repository. Available here.
- https://a16z.com/llmflation-llm-inference-cost/
- https://gorilla.cs.berkeley.edu/leaderboard.html
- https://github.com/NVIDIA/NeMo-Guardrails?tab=readme-ov-file
- https://www.madrona.com/the-rise-of-ai-agent-infrastruct/
- https://www.langchain.com/
- https://www.langchain.com/langgraph
- https://www.crewai.com/
- https://github.com/openai/swarm
- https://github.com/microsoft/autogen
- https://www.llamaindex.ai/
- https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow
- https://developer.nvidia.com/blog/amdocs-accelerates-generative-ai-performance-and-lowers-costs-with-nvidia-nim/
- https://www.salesforce.com/customer-stories/wiley/
Related Articles
Top 40 AI Tools of 2025 (Tested and Tried)
Unveiling the Mystique of AI Agents: 7 Common FAQs: Definitions, Types, Application Ecosystem
2025 Trends in Corporate Data and AI: AI Agents, Intelligent Data Platforms, Data and Operations Trends
Multi-AI Agent Systems Transform Software Architecture, Revolutionizing How Software is Built and Used
Analysis of 7 Popular AI Agent Frameworks, Their Features, and Future
Human Intelligence Exchange
Bringing Future Intelligence Closer
Facilitating Communication Platform Between Humans and Future Intelligence
All Network | Human Intelligence Exchange
Join the Discussion Group ·

![]() |
AIGC Discussion Group · |
![]() |
Office Applications Discussion Group |

AI Programmer Discussion Group