Unlocking Efficient Work: Building Multimodal Assistants with Phidata

Exploring the World of Multimodal Agents: Introduction to the Phidata Framework

With the development of artificial intelligence technology, the application of multimodal agents is becoming increasingly widespread. Phidata, as a powerful framework, allows users to build multimodal agents with memory, knowledge, tools, and reasoning capabilities. This article will delve into the features, application scenarios, and specific usage methods of Phidata, showcasing its unique charm and powerful abilities through examples.

Unlocking Efficient Work: Building Multimodal Assistants with Phidata
Phidata

1. What is Phidata?

Phidata is a framework for building multimodal agents, enabling users to accomplish the following tasks:

  • Build multimodal agents with memory, knowledge, tools, and reasoning capabilities.
  • Create teams of agents that can work together to solve problems.
  • Interact with agents through an aesthetically pleasing user interface.

2. Installing Phidata

To get started with Phidata, simply install the relevant libraries via pip:

pip install -U phidata

3. Key Features of Phidata

Phidata has several outstanding features that make agent development simple and efficient across various application scenarios:

  • Simple and Elegant: Create agents using simple, clear code, enhancing development efficiency.
  • Powerful Flexibility: Agents can utilize various tools to perform complex tasks.
  • Default Multimodal: Supports text, image, audio, and video inputs.
  • Multi-Agent Collaboration: Multiple agents can form a team to complete complex tasks together.
  • Aesthetic Agent User Interface: Users can intuitively interact with agents.
  • Agentic RAG: Built-in Retrieval-Augmented Generation (RAG) capabilities for information retrieval and processing.
  • Structured Output: Agents can return outputs in structured formats for subsequent processing and utilization.

4. Examples of Using Phidata

In this section, we will demonstrate how to use Phidata to build agents through various examples.

4.1 Creating a Web Search Agent

Let’s see how to create a web search agent in just ten lines of code. Create a file<span>web_search.py</span>:

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo

web_agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGo()],
    instructions=["Always include sources"],
    show_tool_calls=True,
    markdown=True,
)
web_agent.print_response("Tell me about OpenAI Sora?", stream=True)

Next, install the necessary libraries and run the agent:

pip install phidata openai duckduckgo-search
export OPENAI_API_KEY=sk-xxxx
python web_search.py

4.2 Creating a Finance Agent

We can build a finance agent to query financial data. Create a file<span>finance_agent.py</span>:

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.yfinance import YFinanceTools

finance_agent = Agent(
    name="Finance Agent",
    model=OpenAIChat(id="gpt-4o"),
    tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True, company_news=True)],
    instructions=["Use tables to display data"],
    show_tool_calls=True,
    markdown=True,
)
finance_agent.print_response("Summarize analyst recommendations for NVDA", stream=True)

Install the required libraries and run the agent:

pip install yfinance
python finance_agent.py

4.3 Multimodal Agent Example

Phidata agents can handle image inputs. Create a file<span>image_agent.py</span>:

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGo()],
    markdown=True,
)

agent.print_response(
    "Tell me about this image and give me the latest news about it.",
    images=["https://upload.wikimedia.org/wikipedia/commons/b/bf/Krakow_-_Kosciol_Mariacki.jpg"],
    stream=True,
)

Run this agent:

python image_agent.py

4.4 Multi-Agent Collaboration

We can build multiple agents to work collaboratively. Create a file<span>agent_team.py</span>:

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.tools.duckduckgo import DuckDuckGo
from phi.tools.yfinance import YFinanceTools

web_agent = Agent(
    name="Web Agent",
    role="Search the web for information",
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGo()],
    instructions=["Always include sources"],
    show_tool_calls=True,
    markdown=True,
)

finance_agent = Agent(
    name="Finance Agent",
    role="Get financial data",
    model=OpenAIChat(id="gpt-4o"),
    tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True)],
    instructions=["Use tables to display data"],
    show_tool_calls=True,
    markdown=True,
)

agent_team = Agent(
    team=[web_agent, finance_agent],
    model=OpenAIChat(id="gpt-4o"),
    instructions=["Always include sources", "Use tables to display data"],
    show_tool_calls=True,
    markdown=True,
)

agent_team.print_response("Summarize analyst recommendations and share the latest news for NVDA", stream=True)

Run the agent team:

python agent_team.py

5. Chatting with Agents Using an Aesthetic User Interface

Phidata provides a beautiful UI that allows users to interact with agents. Create a file<span>playground.py</span>:

from phi.agent import Agent
from phi.model.openai import OpenAIChat
from phi.storage.agent.sqlite import SqlAgentStorage
from phi.tools.duckduckgo import DuckDuckGo
from phi.tools.yfinance import YFinanceTools
from phi.playground import Playground, serve_playground_app

web_agent = Agent(
    name="Web Agent",
    model=OpenAIChat(id="gpt-4o"),
    tools=[DuckDuckGo()],
    instructions=["Always include sources"],
    storage=SqlAgentStorage(table_name="web_agent", db_file="agents.db"),
    add_history_to_messages=True,
    markdown=True,
)

finance_agent = Agent(
    name="Finance Agent",
    model=OpenAIChat(id="gpt-4o"),
    tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True, company_news=True)],
    instructions=["Use tables to display data"],
    storage=SqlAgentStorage(table_name="finance_agent", db_file="agents.db"),
    add_history_to_messages=True,
    markdown=True,
)

app = Playground(agents=[finance_agent, web_agent]).get_app()

if __name__ == "__main__":
    serve_playground_app("playground:app", reload=True)

Authenticate and run the Agent Playground:

phi auth
# or
export PHI_API_KEY=phi-***
pip install 'fastapi[standard]' sqlalchemy
python playground.py

Open the provided link or visit<span>http://phidata.app/playground</span>, select the<span>localhost:7777</span> endpoint and start chatting with the agents.

6. Experimental Reasoning Agent

Reasoning can help agents solve problems step by step. Create a file<span>reasoning_agent.py</span>:

from phi.agent import Agent
from phi.model.openai import OpenAIChat

task = (
    "Three missionaries and three cannibals need to cross a river. "
    "They have a boat that can carry up to two people at a time. "
    "If, at any time, the cannibals outnumber the missionaries on either side of the river, the cannibals will eat the missionaries. "
    "How can all six people get across the river safely? Provide a step-by-step solution and show the solutions as an ascii diagram"
)

reasoning_agent = Agent(model=OpenAIChat(id="gpt-4o"), reasoning=True, markdown=True, structured_outputs=True)
reasoning_agent.print_response(task, stream=True, show_full_reasoning=True)

Run the reasoning agent:

python reasoning_agent.py

7. Monitoring and Debugging

Phidata has built-in monitoring and debugging features to track agent sessions and view debug logs. Example code:

from phi.agent import Agent

agent = Agent(markdown=True, monitoring=True)
agent.print_response("Share a 2 sentence horror story")

8. Summary and Comparison with Similar Projects

Phidata offers developers an efficient agent framework with its user-friendly interface, powerful multimodal processing capabilities, and flexible tool usage. It allows developers to quickly build and deploy various agents and solve complex tasks through team collaboration.

Similar open-source projects to Phidata include:

  • Haystack: A framework specialized for building question-answering systems, supporting multiple backends.
  • Rasa: Focused on intelligent dialogue systems, allowing the creation of complex dialogue management strategies.
  • LangChain: Combines large language models for agent development and supports external knowledge bases.

These projects each have their own characteristics, suitable for different application scenarios, and developers can choose based on specific needs.

The diversity and flexibility of Phidata will lead the future of multimodal agents, and everyone is welcome to explore this promising field!

Leave a Comment