Phidata: A Framework for Multi-Modal Agents

More AI Open Source Tools:

https://www.aiinn.cn/

Phidata is a framework for building multi-modal agents. Using Phidata, you can: build multi-modal agents with memory, knowledge, tools, and reasoning. Establish a team of agents that can collaborate to solve problems. Chat with your agents using a beautiful Agent UI.

16200 Stars 2200 Forks 28 Issues 82 Contributors MPL-2.0 License Python Language

Code: https://github.com/phidatahq/phidata

Homepage: https://docs.phidata.com/

Phidata: A Framework for Multi-Modal Agents

Main Features

  • Simple and Elegant

  • Powerful and Flexible

  • Defaults to Multi-Modal

  • Multi-Agent Orchestration

  • Beautiful Agent UI to chat with your agents

  • Built-in Agentic RAG

  • Structured Output

  • Reasoning Agents

  • Built-in Monitoring and Debugging Features

Installation and Usage

Simple and Elegant

Phidata agents are simple and elegant, resulting in minimal and aesthetically pleasing code.

For example, you can create a web search agent in 10 lines of code by creating a file web_search.py

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo

web_agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGo()],
    instructions=["Always include sources"],
    show_tool_calls=True,
    markdown=True,
)

web_agent.print_response("Tell me about OpenAI Sora?", stream=True)

Install the library, export and run the agent:OPENAI_API_KEY

pip install phidata openai duckduckgo-search
export OPENAI_API_KEY=sk-xxxx
python web_search.py

Powerful and Flexible

Phidata agents can use various tools and complete complex tasks according to instructions.

For example, you can create a finance agent that queries financial data by creating a file finance_agent.py

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.yfinance import YFinanceTools

finance_agent = Agent(
    name="Finance Agent",
    model=OpenAIChat(id="gpt-4o"),
    tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True, company_news=True)],
    instructions=["Use tables to display data"],
    show_tool_calls=True,
    markdown=True,
)

finance_agent.print_response("Summarize analyst recommendations for NVDA", stream=True)

Install the libraries and run the agent:

pip install yfinance
python finance_agent.py

Defaults to Multi-Modal

Phidata agents support text, images, audio, and video.

For example, you can create an image agent that understands images and can call tools as needed by creating a file image_agent.py

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGo()],
    markdown=True,
)

agent.print_response("Tell me about this image and give me the latest news about it.", images=["https://upload.wikimedia.org/wikipedia/commons/b/bf/Krakow_-_Kosciol_Mariacki.jpg"], stream=True,)

Run the agent:

python image_agent.py

Multi-Agent Orchestration

Phidata agents can work together as a team to accomplish complex tasks, creating a file agent_team.py

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
from phi.tools.yfinance import YFinanceTools

web_agent = Agent(
    name="Web Agent",
    role="Search the web for information",
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGo()],
    instructions=["Always include sources"],
    show_tool_calls=True,
    markdown=True,
)

finance_agent = Agent(
    name="Finance Agent",
    role="Get financial data",
    model=OpenAIChat(id="gpt-4o"),
    tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True)],
    instructions=["Use tables to display data"],
    show_tool_calls=True,
    markdown=True,
)

agent_team = Agent(
    team=[web_agent, finance_agent],
    model=OpenAIChat(id="gpt-4o"),
    instructions=["Always include sources", "Use tables to display data"],
    show_tool_calls=True,
    markdown=True,
)

agent_team.print_response("Summarize analyst recommendations and share the latest news for NVDA", stream=True)

Run the Agent Team:

python agent_team.py

Beautiful Agent UI to Chat with Your Agents

Phidata provides a beautiful UI for interacting with your agents. Let’s try it by creating a file playground.py

Note

Phidata does not store any data; all agent data is stored in a local sqlite database.

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.storage.agent.sqlite import SqlAgentStorage
from phi.tools.duckduckgo import DuckDuckGo
from phi.tools.yfinance import YFinanceTools
from phi.playground import Playground, serve_playground_app

web_agent = Agent(
    name="Web Agent",
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGo()],
    instructions=["Always include sources"],
    storage=SqlAgentStorage(table_name="web_agent", db_file="agents.db"),
    add_history_to_messages=True,
    markdown=True,
)

finance_agent = Agent(
    name="Finance Agent",
    model=OpenAIChat(id="gpt-4o"),
    tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True, company_news=True)],
    instructions=["Use tables to display data"],
    storage=SqlAgentStorage(table_name="finance_agent", db_file="agents.db"),
    add_history_to_messages=True,
    markdown=True,
)

app = Playground(agents=[finance_agent, web_agent]).get_app()

if __name__ == "__main__":
    serve_playground_app("playground:app", reload=True)

Authenticate with Phidata by running the following command:

phi auth

Or export from your workspace phidata.app

export PHI_API_KEY=phi-***

Install dependencies and run Agent Playground:

pip install 'fastapi[standard]' sqlalchemy
python playground.py
  • Open the provided link or navigate to http://phidata.app/playground

  • Select endpoints and start chatting with your agents!localhost:7777

  • AgentPlayground.mp4

Agent RAG

We are the first company to pioneer Agentic RAG using our Auto-RAG paradigm. With Agentic RAG (or auto-rag), agents can search for specific information needed to complete tasks in their knowledge base (vector database) instead of always inserting “context” in the prompt.

This saves tokens and improves response quality. Create a file rag_agent.py

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.embedder.openai import OpenAIEmbedder
from phi.knowledge.pdf import PDFUrlKnowledgeBase
from phi.vectordb.lancedb import LanceDb, SearchType

# Create a knowledge base from a PDF
knowledge_base = PDFUrlKnowledgeBase(
    urls=["https://phi-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
    # Use LanceDB as the vector database
    vector_db=LanceDb(
        table_name="recipes",
        uri="tmp/lancedb",
        search_type=SearchType.vector,
        embedder=OpenAIEmbedder(model="text-embedding-3-small"),
    ),
)

# Comment out after first run as the knowledge base is loaded
knowledge_base.load()

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    # Add the knowledge base to the agent
    knowledge=knowledge_base,
    show_tool_calls=True,
    markdown=True,
)

agent.print_response("How do I make chicken and galangal in coconut milk soup", stream=True)

Install libraries and run the agent:

pip install lancedb tantivy pypdf sqlalchemy
python rag_agent.py

Structured Output

Agents can return their output in structured format as Pydantic models.

Create a file structured_output.py

from typing import List
from pydantic import BaseModel, Field
from phi.agent import Agent
from phi.model.openai import OpenAIChat

# Define a Pydantic model to enforce the structure of the output
class MovieScript(BaseModel):
    setting: str = Field(..., description="Provide a nice setting for a blockbuster movie.")
    ending: str = Field(..., description="Ending of the movie. If not available, provide a happy ending.")
    genre: str = Field(..., description="Genre of the movie. If not available, select action, thriller or romantic comedy.")
    name: str = Field(..., description="Give a name to this movie")
    characters: List[str] = Field(..., description="Name of characters for this movie.")
    storyline: str = Field(..., description="3 sentence storyline for the movie. Make it exciting!")

# Agent that uses JSON mode
json_mode_agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    description="You write movie scripts.",
    response_model=MovieScript,
)

# Agent that uses structured outputs
structured_output_agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    description="You write movie scripts.",
    response_model=MovieScript,
    structured_outputs=True,
)

json_mode_agent.print_response("New York")
structured_output_agent.print_response("New York")
  • Run the file structured_output.py

python structured_output.py
  • The output is an object of the class as follows:MovieScript

MovieScript(
    │ setting='A bustling and vibrant New York City',
    │ ending='The protagonist saves the city and reconciles with their estranged family.',
    │ genre='action',
    │ name='City Pulse',
    │ characters=['Alex Mercer', 'Nina Castillo', 'Detective Mike Johnson'],
    │ storyline='In the heart of New York City, a former cop turned vigilante, Alex Mercer, teams up with a street-smart activist, Nina Castillo, to take down a corrupt political figure who threatens to destroy the city. As they navigate through the intricate web of power and deception, they uncover shocking truths that push them to the brink of their abilities. With time running out, they must race against the clock to save New York and confront their own demons.'
)

Reasoning Agents (Experimental)

Reasoning helps agents solve problems step by step, backtracking and correcting as needed. Create a file reasoning_agent.py

from phi.agent import Agent
from phi.model.openai import OpenAIChat

task = ("Three missionaries and three cannibals need to cross a river. "
        "They have a boat that can carry up to two people at a time. "
        "If, at any time, the cannibals outnumber the missionaries on either side of the river, the cannibals will eat the missionaries. "
        "How can all six people get across the river safely? Provide a step-by-step solution and show the solutions as an ascii diagram")

reasoning_agent = Agent(model=OpenAIChat(id="gpt-4o"), reasoning=True, markdown=True, structured_outputs=True)
reasoning_agent.print_response(task, stream=True, show_full_reasoning=True)

Run the Reasoning Agent:

python reasoning_agent.py

Warning

Reasoning is an experimental feature and will break ~20% of the time. **It is not a replacement for o1.**

This is an experiment driven by curiosity, combining COT and tool usage. For this initial version, set your expectations very low. For example: it will not be able to count the ‘r’ in ‘strawberry’.

Leave a Comment