Building Data Analysis Agents with LangChain, CrewAI, and AutoGen

Building a data analysis agent using LangChain, CrewAI, and AutoGen.

Building Data Analysis Agents with LangChain, CrewAI, and AutoGen

The data analysis agent can automatically conduct analysis tasks, execute code, and provide adaptive responses to data queries. LangChain, CrewAI, and AutoGen are the three popular frameworks for building such AI agents. This article utilizes and compares these three frameworks to construct a simple data analysis agent and examines their performance in practical applications.

1 How Data Analysis Agents Work

The data analysis agent first receives user queries and generates code to read and analyze file data accordingly. It then uses the Python REPL tool to execute the code and returns the execution results to the agent. The agent analyzes the received results and replies to the user query. Given that large language models (LLMs) can generate arbitrary code, caution must be exercised when executing the generated code in a local environment to ensure the entire process is safe and reliable.

2 Building a Data Analysis Agent with LangGraph

Prerequisites

Before building the agent, ensure you have obtained the necessary API keys for the required large language model and use these API keys to load the <span>.env</span> file.

from dotenv import load_dotenv
load_dotenv("./env")

Required Libraries

langchain — 0.3.7
langchain – experimental — 0.3.3
langgraph — 0.2.52
crewai — 0.80.0
Crewai – tools — 0.14.0
autogen – agentchat — 0.2.38

Once ready, start building the agent.

2.1 Steps to Build a Data Analysis Agent with LangGraph

1) Import Necessary Libraries

import pandas as pd
from IPython.display import Image, display
from typing import List, Literal, Optional, TypedDict, Annotated
from langchain_core.tools import tool
from langchain_core.messages import ToolMessage
from langchain_experimental.utilities import PythonREPL
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from langgraph.checkpoint.memory import MemorySaver

2) Define State

class State(TypedDict):
    messages: Annotated[list, add_messages]

graph_builder = StateGraph(State)

3) Define Large Language Model and Code Execution Function, and Bind the Function to the Large Language Model

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)

@tool
def python_repl(code: Annotated[str, "filename to read the code from"]):
    """Use this function to execute Python code read from a file. If you want to see the output of a value,
    make sure to read the code correctly and use `print(...)` to print it, which is visible to the user."""
    try:
        result = PythonREPL().run(code)
        print("RESULT CODE EXECUTION:", result)
    except BaseException as e:
        return f"Execution failed. Error: {repr(e)}"
    return f"Executed:
```python
{code}
```
Standard output: {result}"

llm_with_tools = llm.bind_tools([python_repl])

4) Define the Function for Agent Response and Add it as a Node to the Graph

def chatbot(state: State):
    return {"messages": [llm_with_tools.invoke(state["messages"])]}

graph_builder.add_node("agent", chatbot)

5) Define Tool Node and Add it to the Graph

code_execution = ToolNode(tools=[python_repl])
graph_builder.add_node("tools", code_execution)

If the large language model returns a tool call, we need to route it to the tool node; otherwise, we can end. We define a routing function and then add other edges.

def route_tools(state: State):
    """
    Used in conditional edges, if the last message contains a tool call, route to the tool node; otherwise, route to end.
    """
    if isinstance(state, list):
        ai_message = state[-1]
    elif messages := state.get("messages", []):
        ai_message = messages[-1]
    else:
        raise ValueError(f"No message found in input state for tool edge: {state}")

    if hasattr(ai_message, "tool_calls") and len(ai_message.tool_calls) > 0:
        return"tools"
    return END

graph_builder.add_conditional_edges(
    "agent",
    route_tools,
    {"tools": "tools", END: END},
)
graph_builder.add_edge("tools", "agent")

6) Add Memory for Chatting with the Agent

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

7) Compile and Display the Graph

graph = graph_builder.compile(checkpointer=memory)
display(Image(graph.get_graph().draw_mermaid_png()))

8) Start Chatting

With memory added, we assign a unique thread ID for each conversation and start chatting on that thread.

config = {"configurable": {"thread_id": "1"}}

def stream_graph_updates(user_input: str):
    events = graph.stream(
        {"messages": [("user", user_input)]}, config, stream_mode="values"
    )
    for event in events:
        event["messages"][-1].pretty_print()

while True:
    user_input = input("User: ")
    if user_input.lower() in ["quit", "exit", "q"]:
        print("Goodbye!")
        break
    stream_graph_updates(user_input)

During the loop, we first provide the file path and then ask any questions based on the data.

The output is as follows:

Building Data Analysis Agents with LangChain, CrewAI, and AutoGen With memory included, we can ask any questions about the dataset in the chat. The agent will generate the required code and execute it. The result of the code execution will be returned to the large language model, as shown in the example below:

Building Data Analysis Agents with LangChain, CrewAI, and AutoGen

2.2 Building Data Analysis Agents with CrewAI

Now, we will use CrewAI for data analysis tasks.

1) Import Necessary Libraries

from crewai import Agent, Task, Crew
from crewai.tools import tool
from crewai_tools import DirectoryReadTool, FileReadTool
from langchain_experimental.utilities import PythonREPL

2) Build a Coding Agent and an Executing Agent

coding_agent = Agent(
    role="Python Developer",
    goal="Write well-designed, thoughtful code to solve a given problem",
    backstory="You are a seasoned Python developer with extensive experience in software and its best practices. You excel at writing concise, efficient, and scalable code.",
    llm='gpt-4o',
    human_input=True
)

coding_task = Task(
    description="Write code to solve the given problem and assign the code output to the 'result' variable. Problem: {problem}",
    expected_output="Code that solves the problem, output should be assigned to 'result' variable",
    agent=coding_agent
)

3) Define the Code Execution Function as a CrewAI Tool

To execute code, we will use PythonREPL() and define it as a CrewAI tool.

@tool("repl")
def repl(code: str) -> str:
    """Used to execute Python code"""
    return PythonREPL().run(command=code)

4) Define Executing Agent and Task with Access to repl and FileReadTool()

executing_agent = Agent(
    role="Python Executor",
    goal="Run the received code to solve the given problem",
    backstory="You are a Python developer with extensive experience in software and its best practices. You can effectively execute code, debug, and optimize Python solutions.",
    llm='gpt-4o-mini',
    human_input=True,
    tools=[repl, FileReadTool()]
)

executing_task = Task(
    description="Execute code to solve the given problem and assign the code output to the 'result' variable. Problem: {problem}",
    expected_output='Result of the problem',
    agent=executing_agent
)

5) Build a Crew with Two Agents and Corresponding Tasks

analysis_crew = Crew(
    agents=[coding_agent, executing_agent],
    tasks=[coding_task, executing_task],
    verbose=True
)

6) Run the Crew with the Following Input

inputs = {'problem': "Read this file and return the column names while calculating the average age /home/santhosh/Projects/Code/LangGraph/gym_members_exercise_tracking.csv"}
result = analysis_crew.kickoff(inputs=inputs)
print(result.raw)

The output is as follows:

Building Data Analysis Agents with LangChain, CrewAI, and AutoGen

2.3 Building Data Analysis Agents with AutoGen

1) Import Necessary Libraries

from autogen import ConversableAgent
from autogen.coding import LocalCommandLineCodeExecutor, DockerCommandLineCodeExecutor

2) Define Code Executor and Agent Using That Executor

executor = LocalCommandLineCodeExecutor(
    timeout=10,  # Timeout for each code execution (seconds)
    work_dir='./Data'  # Directory for storing code files
)

code_executor_agent = ConversableAgent(
    "code_executor_agent",
    llm_config=False,
    code_execution_config={"executor": executor},
    human_input_mode="ALWAYS"
)

3) Define Code Writing Agent and Set Custom System Message

Get the code_writer system message from https://microsoft.github.io/autogen/0.2/docs/tutorial/code-executors/.

code_writer_agent = ConversableAgent(
    "code_writer_agent",
    system_message=code_writer_system_message,
    llm_config={"config_list": [{"model": "gpt-4o-mini"}]},
    code_execution_config=False
)

4) Define the Problem to Solve and Start Chatting

problem = "Read the file at '/home/santhosh/Projects/Code/LangGraph/gym_members_exercise_tracking.csv' and print the average age of the members"

chat_result = code_executor_agent.initiate_chat(
    code_writer_agent,
    message=problem
)

Once the chat starts, we can also ask any follow-up questions about the dataset. If the code has errors, we can request modifications; if the code is correct, we simply press Enter to continue executing the code.

5) Optionally, Use the Following Code to Print the Questions We Asked and Their Answers

for message in chat_result.chat_history:
    if message['role'] == 'assistant':
        if 'exitcode' not in message['content']:
            print(message['content'])
            print('\n')
    else:
        if 'TERMINATE' in message['content']:
            print(message['content'])
            print("----------------------------------------")

The results are as follows:

Building Data Analysis Agents with LangChain, CrewAI, and AutoGen

Highlights Review

In 2025, 10 AI technology trends to watch

Using knowledge graphs to significantly enhance RAG accuracy

Domestic light Deepseek v3 shock release, surpassing Claude 3.5, GPT-4o

Ali Tongyi releases QVQ-72B, leading you to run the strongest visual reasoning large model

Code it! Pydantic AI agent framework, easily create AI agents

OpenAI o3 blows up the scene, programming beats 99.7%, programmers are no longer needed

Building Data Analysis Agents with LangChain, CrewAI, and AutoGen

1 How Data Analysis Agents Work

2 Building a Data Analysis Agent with LangGraph

2.1 Steps to Build a Data Analysis Agent with LangGraph

2.2 Building Data Analysis Agents with CrewAI

2.3 Building Data Analysis Agents with AutoGen

Recommended Reading List

“Understand AI Agents in One Book: Technology, Applications, and Business”

Highlights Review

Leave a Comment Cancel reply