In today’s rapidly changing technological world, artificial intelligence has become an important component in many industries. To help developers and businesses utilize this technology more efficiently, the Phidata framework has emerged. Phidata is an open-source framework dedicated to building multi-modal agents, solving real-world problems through the intelligence and tools of this platform. Whether it’s handling memory, knowledge, tools, or reasoning, Phidata provides strong support.
What Does Phidata Do? 
The core goal of Phidata is to provide a platform that enables users to create multi-modal agents that possess not only memory and knowledge but can also use tools and perform reasoning as needed. In other words, Phidata is not just a simple chatbot framework; it is a comprehensive solution for building teams of agents capable of handling complex tasks.
With just a simple command, you can install it:
pip install -U phidata
Main Features
Phidata’s features are powerful, covering a range of modules and functionalities that allow users to choose and combine them freely based on different application scenarios. These features include:
-
Simple and Elegant: Clean code, easy to use, and quick to get started. -
Powerful and Flexible: Supports various tools and can execute complex tasks based on instructions. -
Default Multi-Modal: Agents support text, image, audio, and video inputs. -
Agent Team Collaboration: Multiple agents can work together to complete complex tasks. -
Aesthetic UI: Provides a user interface for interacting with agents. -
Structured Output: Output results support structured formats for easier processing. -
Built-in Reasoning Capabilities: Reasoning functions help agents solve problems step by step. -
Monitoring and Debugging Features: Built-in monitoring and debugging features for tracking and optimizing agent performance.
How to Use Phidata?
With Phidata, you can create a search agent in just a few lines of code:
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
web_agent = Agent(
model=OpenAIChat(id="gpt-4o"),
tools=[DuckDuckGo()],
instructions=["Always include sources"],
show_tool_calls=True,
markdown=True,
)
web_agent.print_response("Tell me about OpenAI Sora?", stream=True)
With the necessary libraries installed and API keys set up, you can easily run this search agent:
pip install phidata openai duckduckgo-search
export OPENAI_API_KEY=sk-xxxx
python web_search.py
Multi-Modal Agents
Phidata’s multi-modal agents naturally support processing data in various formats, such as text, images, audio, and video. For example, you can create an agent that understands images and calls tools as needed:
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
agent = Agent(
model=OpenAIChat(id="gpt-4o"),
tools=[DuckDuckGo()],
markdown=True,
)
agent.print_response(
"Tell me about this image and give me the latest news about it.",
images=["https://upload.wikimedia.org/wikipedia/commons/b/bf/Krakow_-_Kosciol_Mariacki.jpg"],
stream=True,
)
Agent Team Collaboration
Phidata allows you to create a team of agents that can work together to accomplish complex tasks:
from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
from phi.tools.yfinance import YFinanceTools
web_agent = Agent(
name="Web Agent",
role="Search the web for information",
model=OpenAIChat(id="gpt-4o"),
tools=[DuckDuckGo()],
instructions=["Always include sources"],
show_tool_calls=True,
markdown=True,
)
finance_agent = Agent(
name="Finance Agent",
role="Get financial data",
model=OpenAIChat(id="gpt-4o"),
tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True)],
instructions=["Use tables to display data"],
show_tool_calls=True,
markdown=True,
)
agent_team = Agent(
team=[web_agent, finance_agent],
model=OpenAIChat(id="gpt-4o"),
instructions=["Always include sources", "Use tables to display data"],
show_tool_calls=True,
markdown=True,
)
agent_team.print_response("Summarize analyst recommendations and share the latest news for NVDA", stream=True)
Beautiful Agent UI
Phidata is not only powerful in the backend but also excels in user experience. The interactive interface is aesthetically pleasing and easy to use, allowing users to conveniently chat with agents and interact with data.
Phidata also provides many examples and debugging tools to help users grasp usage skills more quickly and apply them flexibly in real projects.
In addition to Phidata, there are other notable multi-modal agent tools worth mentioning:
-
Rasa: Designed specifically for building conversational agents, suitable for applications that require deep customization and control. -
Dialogflow: A powerful conversational design platform that provides voice and text dialog capabilities. -
IBM Watson Assistant: Offers cloud platform support with natural language understanding and multi-modal interaction capabilities.
The Phidata open-source project brings us innovative multi-modal agent solutions, showcasing great potential in the functionality, flexibility, and usability of agents. Whether you are a developer, data scientist, or product manager, Phidata can be your top choice in the field of intelligence. Through this article, I hope to inspire everyone and encourage more developers to explore and utilize this technology.