Are you still troubled by the mixed quality of AI services in the domestic market and poor performance?
Then let’s take a look at DevCat AI (3in1).
This is an integrated AI assistant that combines GPT-4, Claude3, and Gemini.
It covers all models of the three AI tools.
Including GPT-4o and Gemini flash
Now, you can own them for just ¥68.
The official value is over ¥420+.
Send ‘DevCat’ in the backend to start using it.
Become a member now to enjoy one-on-one private services, ensuring your usage is safeguarded.
So far, we have recognized that by providing additional tools to LLMs, we can significantly enhance their capabilities.
For example, even ChatGPT can use Bing search and the Python interpreter out of the box in the paid version. OpenAI has taken the lead by providing fine-tuned LLM models for tool usage, where you can pass available tools along with prompts to the API endpoint. The LLM then decides whether to provide a response directly or whether it should first use any available tools.
Note that these tools are not necessarily only for retrieving other information; they can be anything, even allowing the LLM to book a dinner. I previously implemented a project that allowed the LLM to interact with a graphical database through a set of predefined tools, which I called the semantic layer.
Essentially, these tools enhance the capabilities of LLMs like GPT-4 by providing dynamic, real-time access to information, personalization through memory, and complex understanding of relationships through knowledge graphs. Together, they enable LLMs to offer more accurate suggestions, understand user preferences over time, and access a broader range of up-to-date information, leading to a more interactive and adaptive user experience. As mentioned, in addition to being able to retrieve other information during queries, they also provide LLMs with an option to influence their environment, such as booking meetings in a calendar.
While OpenAI has dazzled us with its carefully tuned models for tool usage, the reality is that most other LLMs cannot reach OpenAI’s level in terms of function calls and tool usage. I have tried most of the models provided in Ollama, and most of them cannot consistently generate predefined structured outputs that can support the agent. On the other hand, some models are finely tuned for function calls. However, the function calls followed by these models have a custom prompt engineering pattern that is not well documented, or they cannot be used for anything other than function calls.
Ultimately, I decided to follow existing LangChain implementations and implement a JSON-based agent using the Mixtral 8x7b LLM. I used Mixtral 8x7b as a movie agent, interacting with the native graphical database Neo4j through the semantic layer. The code can be used as a LangChain template and Jupyter notebook. Here’s how we implement a JSON-based LLM agent.
Tools in the Semantic Layer
The examples in the LangChain documentation (JSON agent, HuggingFace example) use tools with a single string input. Since the tools in the semantic layer use slightly more complex inputs, I had to dig a little deeper. Here is an example input for the recommended tool.
all_genres = [
"Action",
"Adventure",
"Animation",
"Children",
"Comedy",
"Crime",
"Documentary",
"Drama",
"Fantasy",
"Film-Noir",
"Horror",
"IMAX",
"Musical",
"Mystery",
"Romance",
"Sci-Fi",
"Thriller",
"War",
"Western",
]
class RecommenderInput(BaseModel):
movie: Optional[str] = Field(description="movie used for recommendation")
genre: Optional[str] = Field(
description=(
"genre used for recommendation. Available options are:" f"{all_genres}"
)
)
Additionally, we use an enumeration of available values for type parameters. While the input is not very complex, it is still more advanced than a single string input, so the implementation must be slightly different.
JSON-Based LLM Agent Prompt
In my implementation, I drew significant inspiration from existing prompts in the LangChain hub hwchase17/react-json
. The prompt uses the following system message.
Answer the following questions as best you can. You have access to the following tools:
{tools}
The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).
The only values that should be in the "action" field are: {tool_names}
The $JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. Here is an example of a valid $JSON_BLOB:
```
{{
"action": $TOOL_NAME,
"action_input": $INPUT
}}
```
ALWAYS use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action:
```
$JSON_BLOB
```
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin! Reminder to always use the exact characters `Final Answer` when responding.
The prompt first defines the available tools, which we will discuss later. The most important part of the prompt is instructing the LLM what the output should look like. When the LLM needs to call a function, it should use the following JSON structure:
{{
"action": $TOOL_NAME,
"action_input": $INPUT
}}
This is why it is called a JSON-based agent: when the LLM wants to use any available tool, we instruct it to generate JSON. However, this is only part of the output definition. The complete output should have the following structure:
Thought: you should always think about what to do
Action:
```
$JSON_BLOB
```
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)Final Answer: the final answer to the original input question
The LLM should always explain what it is doing in the thought section of the output. When it wants to use any available tool, the action input should be provided as a JSON blob. The observation section is reserved for tool output, and when the agent decides it can return an answer to the user, it should use the final answer key. Here’s an example of a movie agent using this structure.
In this example, we ask the agent to recommend a good comedy. Since one of the available tools for the agent is the recommendation tool, it decides to utilize the recommendation tool, providing JSON syntax to define its input. Fortunately, LangChain has a built-in JSON agent output parser, so we don’t have to worry about implementing it. Next, the LLM retrieves the response from the tool and uses it as the observation in the prompt. Since the tool provides all the necessary information, the LLM believes it has enough information to construct the final answer, which it can return to the user.
I noticed it was challenging to prompt engineer Mixtral in such a way that it only uses JSON syntax when it needs to use tools. In my experiments, when it did not want to use any tools, it sometimes used the following JSON action input.
{{
"action": Null,
"action_input": ""
}}
The output parsing function in LangChain does not ignore actions (if the action is empty or similar) but instead returns an error: “undefined empty tool.” I tried to prompt engineer a solution to this issue, but I could not do it consistently. Therefore, I decided to add a virtual smalltalk tool that the agent could call when the user wanted to engage in small talk.
response = (
"Create a final answer that says if they "
"have any questions about movies or actors"
)
class SmalltalkInput(BaseModel):
query: Optional[str] = Field(description="user query")
class SmalltalkTool(BaseTool):
name = "Smalltalk"
description = "useful for when user greets you or wants to smalltalk"
args_schema: Type[BaseModel] = SmalltalkInput
def _run(
self,
query: Optional[str] = None,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
"""Use the tool."""
return response
In this way, the agent can decide to use the virtual Smalltalk tool when the user encounters a question, and we will no longer face issues of parsing empty or missing tool names.
This workaround works well. As mentioned earlier, if no action is needed, most models are not trained to produce action inputs or text, so we must use the currently available tools. However, sometimes models also fail to successfully call any tools on the first iteration, depending on the situation. But giving it an exit option like the smalltalk tool seems to prevent exceptions.
Defining Tool Inputs in System Prompts
We must figure out how to define slightly more complex tool inputs so that the LLM can interpret them correctly. Interestingly, after implementing a custom function, I discovered an existing LangChain function that converts custom Pydantic tool input definitions into JSON objects recognizable by Mixtral.
from langchain.tools.render import render_text_description_and_args
tools = [RecommenderTool(), InformationTool(), Smalltalk()]
tool_input = render_text_description_and_args(tools)
print(tool_input)
It produces the following string description:
"Recommender":"useful for when you need to recommend a movie",
"args":{
{
"movie":{
{
"title":"Movie",
"description":"movie used for recommendation",
"type":"string"
}
},
"genre":{
{
"title":"Genre",
"description":"genre used for recommendation. Available options are:['Action', 'Adventure', 'Animation', 'Children', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror', 'IMAX', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']",
"type":"string"
}
}
}
},
"Information":"useful for when you need to answer questions about various actors or movies",
"args":{
{
"entity":{
{
"title":"Entity",
"description":"movie or a person mentioned in the question",
"type":"string"
}
},
"entity_type":{
{
"title":"Entity Type",
"description":"type of the entity. Available options are 'movie' or 'person'",
"type":"string"
}
}
}
},
"Smalltalk":"useful for when user greets you or wants to smalltalk",
"args":{
{
"query":{
{
"title":"Query",
"description":"user query",
"type":"string"
}
}
}
}
We can simply copy this tool description into the system prompt, and Mixtral will be able to use the defined tools.

If this was helpful to you, don’t rush 😝 to click ‘Share’ and ‘Look’ before swiping away🫦