The open-source agent series focuses on introducing currently available open-source agent frameworks in the market, such as CrewAI, AutoGen, LangChain, phidata, Swarm, etc., discussing their advantages, disadvantages, features, effects, and usage.
Interested friends can follow the public account “XiaozhiAGI” for continuous updates on cutting-edge AI technologies and products, such as RAG, Agent, Agentic workflow, AGI.
Phidata
Phidata is a framework for building multimodal agents, enabling the creation of agents with memory, knowledge, tools, and reasoning capabilities. Using phidata allows you to:
- Build multimodal agents with memory, knowledge, tools, and reasoning capabilities.
- Create teams of agents that can collaboratively solve problems.
- Interact with your agents via a beautiful agent UI.

1. Installation
pip install -U phidata
2. Features
-
Multimodal support: Supports various data formats such as text, images, audio, and video. -
Collaborative agent teams: Allows multiple agents to work together to accomplish complex tasks. -
Elegant user interface: Provides an intuitive UI for smooth interaction between users and agents. -
Structured output: Agent responses are presented in a structured format, enhancing usability and readability. -
Monitoring and debugging features: Built-in tools for monitoring and debugging help developers track agent performance in real-time, ensuring efficient and smooth application operation.
3. Innovations
Phidata enhances the capabilities of LLMs through its unique integration of memory, knowledge, and tools, enabling them not only to understand and respond to user needs but also to provide debugging UI pages, monitoring logs, and execute specific operations such as data queries and API calls.
4. Sample References
4.1. Web Search
web_search.py
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
web_agent = Agent(
model=OpenAIChat(id="gpt-4o"),
tools=[DuckDuckGo()],
instructions=["Always include sources"],
show_tool_calls=True,
markdown=True,
)
web_agent.print_response("Tell me about OpenAI Sora?", stream=True)
Dependency Installation
pip install phidata openai duckduckgo-search
API Key Setup
Linux:
export OPENAI_API_KEY=sk-xxxx
Windows:
import os
os.environ['OPENAI_API_KEY'] = 'xxx'
Run the Code
python web_search.py
4.2. Multimodal Q&A
Add an image and ask questions about its contents.
image_agent.py
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
tools=[DuckDuckGo()],
markdown=True,
)
agent.print_response(
"Tell me about this image and give me the latest news about it.",
images=["https://upload.wikimedia.org/wikipedia/commons/b/bf/Krakow_-_Kosciol_Mariacki.jpg"],
stream=True,
)
Run the Code
python image_agent.py
4.3. RAG Q&A
rag_agent.py
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.embedder.openai import OpenAIEmbedder
from phi.knowledge.pdf import PDFUrlKnowledgeBase
from phi.vectordb.lancedb import LanceDb, SearchType
# Create a knowledge base from a PDF
knowledge_base = PDFUrlKnowledgeBase(
urls=["https://phi-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
# Use LanceDB as the vector database
vector_db=LanceDb(
table_name="recipes",
uri="tmp/lancedb",
search_type=SearchType.vector,
embedder=OpenAIEmbedder(model="text-embedding-3-small"),
),
)
# Comment out after first run as the knowledge base is loaded
knowledge_base.load()
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
# Add the knowledge base to the agent
knowledge=knowledge_base,
show_tool_calls=True,
markdown=True,
)
agent.print_response("How do I make chicken and galangal in coconut milk soup", stream=True)
Run the Code
# Installation: pip install lancedb tantivy pypdf sqlalchemy
python rag_agent.py
4.4. Agent-UI and Multi-Agent Collaboration
playground.py
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.storage.agent.sqlite import SqlAgentStorage
from phi.tools.duckduckgo import DuckDuckGo
from phi.tools.yfinance import YFinanceTools
from phi.playground import Playground, serve_playground_app
web_agent = Agent(
name="Web Agent",
model=OpenAIChat(id="gpt-4o"),
tools=[DuckDuckGo()],
instructions=["Always include sources"],
storage=SqlAgentStorage(table_name="web_agent", db_file="agents.db"),
add_history_to_messages=True,
markdown=True,
)
finance_agent = Agent(
name="Finance Agent",
model=OpenAIChat(id="gpt-4o"),
tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True, company_news=True)],
instructions=["Use tables to display data"],
storage=SqlAgentStorage(table_name="finance_agent", db_file="agents.db"),
add_history_to_messages=True,
markdown=True,
)
app = Playground(agents=[finance_agent, web_agent]).get_app()
if __name__ == "__main__":
serve_playground_app("playground:app", reload=True)
First, authenticate by running the command below in the terminal
# Authentication
phi auth
# PHI API key
export PHI_API_KEY=phi-***
Or directly register and log in on the website: https://www.phidata.app/, configure it, and you can run the code.
# Installation: pip install 'fastapi[standard]' sqlalchemy
python playground.py
Open the provided link or visit <span>http://phidata.app/playground</span>
, select <span>localhost:7777</span>
endpoint and start chatting with the agents.

There are many more examples available in the official documentation:
https://github.com/phidatahq/phidata
5. Role of Phidata
Enhanced Memory Capability
Phidata enables LLMs to engage in long-term conversations by storing chat history in a database, remembering interactions and context with users, thus providing a coherent and personalized communication experience. This memory feature significantly enhances user experience, allowing intelligent assistants to provide more sustained and relevant responses, suitable for applications requiring multi-turn conversations.
Providing Rich Knowledge
Phidata stores key information in a vector database, providing LLMs with necessary business knowledge and context to support the understanding and handling of more complex queries and tasks. This knowledge integration feature allows intelligent assistants to handle more specialized and business-related tasks, such as financial analysis and medical consultations, improving their practicality in specific fields.
Integration of Various Tools
Phidata integrates various tools, enabling LLMs to perform specific operations such as pulling data from APIs, sending emails, querying databases, etc. These tools expand the utility of LLMs, allowing them to automatically execute tasks in various scenarios, such as automated report generation and data retrieval, significantly increasing work efficiency.
Support for Multimodal Interaction
Phidata supports processing various data types, including text, images, audio, and video, providing users with a rich interactive experience. Multimodal support allows intelligent assistants to handle more complex and diverse tasks, such as image recognition and voice processing, further enhancing their application scope and user experience.
Facilitating Agent Collaboration
Phidata allows the creation of multiple agents that can collaborate to complete more complex tasks, supporting team collaboration. This collaboration feature enables intelligent assistants to handle more complex business scenarios, such as project management and team decision-making, enhancing the overall effectiveness of the system.
6. Limitations of Phidata
Dependency on External Services
Phidata needs to integrate with services like Streamlit, FastApi, or Django to build AI applications, which may add some complexity. The reliance on external services can increase system complexity and maintenance costs, especially in scenarios requiring rapid deployment and flexible configuration.
Difficulty in Getting Started
Although Phidata provides detailed documentation and guides, understanding and configuring these tools may still pose challenges for beginners. The difficulty in getting started may limit the willingness and efficiency of certain developers, especially those without relevant experience.
Functionality and Stability
As a new tool, Phidata may not match some mature AI development tools in terms of functionality and stability. The shortcomings in functionality and stability of new tools may affect their performance in real applications, requiring continuous improvement and optimization.
7. Conclusion
Phidata significantly enhances the capabilities of large language models by improving memory, knowledge, and tool functionalities, making them better at context understanding and action capabilities. However, its dependency on external services, difficulty in getting started, and issues with functionality and stability are points developers need to be aware of. Overall, Phidata provides a powerful framework for building efficient and intelligent AI assistants with broad application prospects.
8. References
Phidata GitHub: https://github.com/phidatahq/phidata
Phidata Documentation: https://docs.phidata.com/
https://blog.csdn.net/ymm_ohh/article/details/143224958
https://blog.csdn.net/weixin_36829761/article/details/144177962
https://mp.weixin.qq.com/s?__biz=MjM5NTg1ODg1OA==&mid=2459542215&idx=1&sn=afeec1b0980a7f40334c3b9aad3f25a3&chksm=b0496bee6e4eaa9d3df35c6471f6305accf838cc934561b4680f905506752673fcc214ae84ae#rd