Translation: Patrick Kalkman Unleash Your AI Agent: Automate Time Tracking With LangGraph and Meta Llama 3

Github: https://github.com/PatrickKalkman/hour_registration_agent_public

Midjourney generated work hour registration image, prompt provided by the author.

As a freelancer, the accuracy of my invoices is crucial. To achieve this, I need to accurately record my working hours. I aim to log this time daily, capturing my activities while they are still fresh in my mind.

However, despite my best intentions, the whirlwind of daily tasks often leads to neglect. This can result in the need to painstakingly reconstruct my day, a process that is both time-consuming and frustrating.

My fascination with the potential of artificial intelligence drove me to create an AI solution for automating hour registration. This article describes my efforts to streamline my daily workflow and ensure that no key details are overlooked.

The goal is to free up valuable time and energy, allowing me to focus on what truly matters: my work.

You can find the source code for the final agent in this public GitHub repository.

Registration Process

Before we delve into the details of building the AI agent, let’s establish a solid understanding of how I automate the registration of working hours.

Below is a contextual diagram of the solution. It consists of four modules, each implementing part of the functionality. I will describe the role of each component and how they come together to form a system that simplifies my daily time tracking tasks.

Author provided registration process functionality context diagram.

1. Client Identification

This module begins by identifying the client for the working hours of the day. Given that my work schedule aligns specific dates with specific clients, the agent checks the current working day and selects the client accordingly.

This mapping is hard-coded in Python to keep our initial setup simple and clear. The output is a key detail: the name of the client, which we need to proceed to the subsequent automation stages.

2. Task Retrieval

After identifying the client, the second module uses the obtained client name to retrieve the tasks completed today.

My daily organization tool is Todoist, where I create my to-do list. This module interacts with the Todoist API to filter and extract the tasks I have marked as completed for the identified client.

The output is a collection of descriptions for each completed task, paving the way for the subsequent registration module.

3. Task Summary Generation Using LLM

The third module synthesizes the individual task descriptions into a unified summary. Leveraging the capabilities of a large language model, specifically the 8B variant of Meta Llama 3, it generates a comprehensive narrative of the day’s achievements.

The result is a concise text string that summarizes the key points of the completed tasks and is ready for the final step: time registration.

4. Time Registration

The final module is responsible for recording the working hours. For simplicity, it standardizes the time input to 8 hours per day. Since my accounting application lacks an API, this module uses Selenium WebDriver to automate interactions with the web interface.

It inputs the consistent task summary and specified hours into the system, effectively completing the day’s time tracking.

After we record the working hours, we can proceed to check how we will implement the steps of this process. The first step is to choose an AI agent framework.

Which AI Agent Framework?

Unleash Your AI Agent: Automate Time Tracking With LangGraph and Meta Llama 3 Choosing an AI agent framework, image drawn by Midjourney, prompt provided by the author.

The market is flooded with various AI frameworks, each offering unique features, and new frameworks are constantly emerging. Initially, I tried Crew AI, which garnered quite a bit of attention on social media.

However, I encountered several challenges while using that framework, particularly integrating standard Python functions into the agent. Although Crew AI’s documentation indicated support for local LLM usage, the system always required an OpenAI API key, which I could not overcome. I may return to Crew AI later, but for now, I am still looking for another framework.

My pursuit of a more adaptable solution led me to LangGraph, a powerful library designed for building stateful, multi-role applications on LangChain.

Initial testing results were promising; integrating standard Python functions was straightforward. My familiarity with LangChain also laid a solid foundation for effectively utilizing LangGraph.

Transforming the Process into LangGraph

LangGraph is a library for building stateful, multi-role applications. It can be seen as a new way to run agents using LangChain.

In LangGraph, you create nodes that represent different steps in the workflow. Each node typically encapsulates a specific task or function. You then connect these nodes with edges, forming a directed graph that outlines the flow of the application.

The edges determine the execution path from one node to another, allowing for the creation of complex workflows that can handle various processes and data transformations. This structure allows for dynamic interactions between different parts of the application, as illustrated in the designed flowchart.

Therefore, to convert our functionality context diagram into LangGraph, we must transform each module into Python functions. Communication between these functions or nodes occurs through a state object.

Author provided LangGraph nodes and shared agent state image.

Each node receives state and adds to it when necessary. A node can store something in the state simply by returning a value from the node function.

Understanding `AgentState` in LangGraph

In LangGraph, the state object manages and retains context across different interactions and steps, playing a crucial role in multi-role applications.

To illustrate this, take a look at the AgentState class in the hour registration project, which integrates with the Todoist API to manage tasks. Every piece of information flowing between modules is encapsulated within this class.

class AgentState(TypedDict):
    # The name of the client to search for in the Todoist API
    customer_name: str
    # Descriptions of completed tasks
    task_descriptions: List[str]
    # A comprehensive description generated by the LLM for each task description
    registration_description: str

Later, we will see how to integrate this state object into LangGraph. First, we will look at the nodes that make up our time tracking agent.

1. Client Identification

The second node, task_fetcher_node, connects to the Todoist API and retrieves the tasks I completed today for that client. Below, it retrieves the client’s name using the state with customer_name = state["customer_name"].

Once task_fetcher_node performs its function, it stores the list of task descriptions in the state. The function itself and its auxiliary functions can be referenced in GitHub. This node uses the Todoist PyPI package.

def task_fetcher_node(state):
    api = initialize_todoist_api()
    customer_name = state["customer_name"]
    project_id = get_project_id(api, project_name)
    done_section_id = get_done_section(api, project_id)
    tasks = get_sections_tasks(api, done_section_id)
    task_descriptions = [task.content for task in tasks]
    return {"task_descriptions": task_descriptions}

3. Task Summary Generation

The third node uses a remote LLM to merge each task’s description into a unified description that can be used for registration.

I used meta’s LLAMA 8B model through Groq. If you are unfamiliar with Groq, you should check it out! Groq is a generative AI solutions company and the creator of the LPU™ inference engine. This is a very fast inference engine running on their own custom hardware.

The exciting news is that they offer an API (currently free) that allows you to interact with four different LLMs. You can register on their playground and get an API key. Currently, they provide the following models via their API:

Gemma 7B (8K context length)
Llama 3 8B (8K context length)
Llama 3 70B (8K context length)
Mixtral 8x7B SMoE (32K context length)

I chose the Llama 3 8B model, which is sufficient to generate the merged task description.

Below, you can see the time_registration_description_node_llm node. It interacts with the Groq API using the langchain groq PyPI package. You can see the familiar state passing and the generated description returned via the LLM.

You might be interested in a few things. First, you can see how to construct prompts to guide the LLM in generating task descriptions and return the result as JSON.

Secondly, the generator task_combination_generator = task_combination_prompt | GROQ_LLM | JsonOutputParser()

This uses what LangChain calls the LangChain Expression Language (LCEL). LCEL makes it easy to build complex chains from basic components and supports out-of-the-box features like streaming, parallel processing, and logging.

In our project, we combine prompts, models, and JSON output parsers.

from langchain.prompts import PromptTemplate
from langchain_groq import ChatGroq
from langchain_core.output_parsers import JsonOutputParser

MODEL_NAME = "Llama3-8b-8192"
REGISTRATION_KEY = "registration_description"

def time_registration_description_node_llm(state):
    GROQ_LLM = ChatGroq(model=MODEL_NAME)

    task_descriptions = state.get("task_descriptions")
    if task_descriptions is None:
        raise ValueError("Missing task descriptions in the state.")

    task_combination_prompt = PromptTemplate(
        template="""
        system
        You are an expert at writing task descriptions for the registration of working hours in accounting.
        Multiple task descriptions are given to you, and you are asked to combine them into a cohesive description
        string. Return only the generated description using JSON with a single key called 'registration_description'.
        Do not return any other string.
        user
        TASK_DESCRIPTIONS: {task_descriptions}
        assistant""",
        input_variables=["task_descriptions"],
    )

    task_combination_generator = task_combination_prompt | GROQ_LLM | JsonOutputParser()

    description_data = task_combination_generator.invoke({"task_descriptions": task_descriptions})
    registration_description = description_data.get(REGISTRATION_KEY)
    if registration_description is None:
        raise ValueError("Failed to generate the registration description.")

    return {REGISTRATION_KEY: registration_description}

4. Time Registration

The last node is data_entry_node, which is responsible for registering the LLM-generated merged description in my accounting web application. As mentioned earlier, my accounting service does not provide an API. Therefore, this node uses Selenium WebDriver to automate the time registration input. One important feature of this web application is the use of TOTP for two-factor authentication. I have successfully used the PyPi package Pyotp to automatically input the TOTP code. To make it work effectively, accessing the TOTP key created during the setup of two-factor authentication is essential.

def data_entry_node(state):
    driver = setup_driver()
    try:
        customer = state["customer_name"]
        description = state["registration_description"]
        login(driver)
        enter_totp_code(driver)
        navigate_to_time_entry_page(driver)
        enter_time_details(driver, customer, description)
    except Exception as e:
        logger.exception(f"An error occurred during data entry. {e}")
    finally:
        driver.quit()
        logger.info("WebDriver has been closed.")

I only show the top-level function; if you are interested in other auxiliary functions, you can check the complete source code of the nodes on GitHub.

With all the nodes implemented, we are ready to combine them in LangGraph.

Building the LangGraph Graph

Building the LangGraph graph, image created by Midjourney, author provided inspiration.

Next, we first create a StateGraph instance and pass our state type AgentState to it. Then, we add each node to the graph. The add_node function expects:

key: a string representing the node name. This must be unique.
action: the action to perform when this node is called. This should be a function or callable object.

workflow = StateGraph(AgentState)

workflow.add_node("customer_name_node", customer_name_node)

workflow.add_node("task_fetcher_node", task_fetcher_node)

workflow.add_node("time_registration_description_node_llm", time_registration_description_node_llm)

workflow.add_node("data_entry_node", data_entry_node)

Once all the nodes are added, we can proceed to connect them with edges. First, we set the entry point of the graph to customer_name_node. Then, we connect each node and finally connect data_entry_node to the END string. We tell LangGraph that this is the end of the graph.

workflow.set_entry_point("customer_name_node")

workflow.add_edge("customer_name_node", "task_fetcher_node")

workflow.add_edge("task_fetcher_node", "time_registration_description_node_llm")

workflow.add_edge("time_registration_description_node_llm", "data_entry_node")

workflow.add_edge("data_entry_node", END)

After creating the graph, we will compile it using the compile method. During compilation, LangGraph will validate the graph; for example, if you forget to connect nodes or close the graph, it will remind you during compilation.

Finally, we start the graph using the stream method, passing an empty input.

app = workflow.compile()

for s in app.stream({}):
    print(list(s.values())[0])

I use the stream method instead of invoke because it allows me to print the values returned by each node, which helps with tracking. I also added additional logging in the nodes using the loguru PyPi logging package to improve traceability, as shown below.

Running the Agent

The complete source code for this agent can be found on GitHub. Below, you will find step-by-step instructions on configuring and running the agent on your local machine. Please keep in mind that running the full program may not be straightforward due to specific environment setup requirements, some of which involve creating accounts to access necessary APIs.

Environment Setup

To run, the agent relies on several environment variables that should be securely stored in a .env file. To protect sensitive information, this file should not be included in source control! The program utilizes the dotenv PyPi package to manage these environment variables, loading the settings into the environment from the .env file when the agent starts.

Here are the key environment variables you need to define:

CUSTOMER1 and CUSTOMER2: These variables represent the names of the clients. CUSTOMER1 is used for Monday, Tuesday, and Wednesday, while CUSTOMER2 is used for the rest of the week.

Installing Dependencies

The program uses various PyPi packages. You can install these packages using pip or conda, depending on your preference. Run the following command in your terminal to set up your environment:

For pip users:

pip install -r requirements.txt

For conda users:

conda env create -f environment.yml

Running the Agent

Once you have configured your environment and installed all dependencies, you can run the agent by executing the main script:

python hour_registration.py

Additional Tips

Check the .env example: Check the repository for the example .env.example file to ensure you have correctly set up all required keys and settings.
Verify API Access: Before running the full agent, verify that your API keys are valid and that the correct permissions have been set up on external platforms.
Security Practices: Keep your .env file secure and never share it publicly. Consider using tools or services to manage your keys, especially in production environments.

What’s Next?

My exploration of LangGraph and AI agents is just beginning, and I am excited about the road ahead.

Of course, the program I developed could likely be accomplished with a simple Python function — there’s no need for a complex AI agent. However, delving into such new frameworks is indeed very meaningful. They help me streamline daily tasks and push the boundaries of my development skills.

Currently, the agent is not perfect. It is still a bit unstable and needs more robust error handling — for instance, not crashing when I haven’t logged any daily tasks. But I have always followed this creed:

First, make it work, then make it right, and finally make it fast.

I look forward to gradually addressing these challenges.

I am particularly keen on exploring the conditional edge features of LangGraph, which can dynamically navigate nodes based on specific conditions. I see a lot of potential here to make my daily workflows more efficient and intelligent.

As I continue to refine and enhance this AI-driven time tracking system, I am eager to hear from others on similar journeys or thoughts from those interested in getting started.

What are your thoughts, questions, or insights on using AI in daily workflows? Please share in the comments below — I look forward to learning from your experiences and discussing new possibilities.

Let’s explore these exciting technologies together and see where they can take us.

Registration Process

1. Client Identification

2. Task Retrieval

3. Task Summary Generation Using LLM

4. Time Registration

Which AI Agent Framework?

Transforming the Process into LangGraph

Understanding AgentState in LangGraph

1. Client Identification

3. Task Summary Generation

4. Time Registration

Building the LangGraph Graph

Running the Agent

Environment Setup

Installing Dependencies

Running the Agent

Additional Tips

What’s Next?

Leave a Comment Cancel reply

Understanding `AgentState` in LangGraph