Hi, everyone! My name is Qiushui, and I’m currently focused on AI agents and AI workflow automation.

Friends often ask me how to build a commercial AI agent.

Should I choose Coze, Dify, or LangGraph?
What are the key considerations in the process?
Where should I store my data?
Why can’t I scrape content from certain web pages using this tool?
···

Therefore, at the beginning of 2025, I initiated a collection titled “Building an AI Agent from Scratch” based on my two years of experience in AI agent development.

The content includes the theory, practical implementation, and case studies of AI agents. I will demonstrate step by step how to build a complete commercial AI agent.

If you are a non-technical person, this will be very helpful for you; if you are a technical person, this video will help you avoid some detours and get started quickly!

The full text is 3900 words, so you can bookmark it for later.

Follow me to receive updates on subsequent content.

This article will provide an overview of the seven steps to building an AI agent: requirement analysis, software selection, prompt engineering, database, UI interface construction, testing and evaluation, and deployment.

Building a Commercial AI Agent From Scratch

Requirement Analysis

Workflow Analysis

The first step is to analyze the requirements.

First, we need to clarify what problem we want this AI agent to solve.

If you are a content creator, you might want to create an AI agent to help with repetitive tasks such as finding benchmarks, identifying trends, conducting analysis, and writing drafts, allowing you to focus on content creation.
If you are the owner of a trading company, you might want to create an AI agent to gather orders from different platforms and compare product prices across platforms.

Remember, you need to focus on those repetitive, mechanical tasks that require minimal thought, and the more detailed, the better.

Of course, you can also use AI tools to communicate and form a draft, which you can then supplement.

You are a workflow analysis expert. Please help me outline the repetitive tasks that a needs to perform in daily work, marking which can be assisted by AI and which I should do. Output it in a table format (Task/AI Assistance/Human Task). Once I consider the table complete, I will reply “Continue”, and then you can output it in a mermaid flowchart format, indicating whether each process node can be assisted by AI, with the flowchart being horizontal.

Tools Used

After outlining the requirements, we need to list the tools required based on the workflow analysis.

For example, web scraping tools are needed for data collection; to publish articles, you need to connect to WeChat’s public platform.

Therefore, tool selection is also very important. By leveraging different tools, the AI agent can automate tasks across different systems, thereby reducing manual operations.

AI Agent Selection

The second step is to choose the AI agent development platform, select the appropriate large model, and use different tools to operate across different systems.

Which AI Agent Platform to Choose

Let’s talk about AI agent development platforms. With many no-code agent development platforms like Dify, Coze, and FastGPT, which one should we choose?

Coze can only be used in the cloud and cannot be deployed locally.
Dify is completely open-source with no limitations, but it has weaker capabilities in knowledge Q&A.
FastGPT has certain usage restrictions but is relatively strong in knowledge answering.

More advanced development platforms, such as LangGraph and CrewAI, allow AI to self-plan and execute tasks, but they require coding.

Choosing these platforms depends on our specific needs, and of course, they can also be used in combination.

This requires us to understand the characteristics of each development platform, what they are good at, what they are not good at, and what obvious shortcomings they have. Only by mastering this information can we make appropriate selections based on our scenarios.

Which LLM to Choose

Next, regarding the selection of large models, there are overseas options like OpenAI, Claude, and Gemini, as well as domestic models like Kimi, Tongyi Qianwen, and the recently popular DeepSeek, along with open-source models like LLaMA, Grok, and smaller models like Mistral.

So, based on your AI agent scenario, which model is the most suitable to choose among so many options?

If you do not have privacy data, the best choices are OpenAI and Claude, as they are leading large models. If you are only doing tasks like translation or summarizing articles, domestic large models are also quite effective, and currently, DeepSeek offers a good cost-performance ratio.

Choosing a model should be based on your specific use case, and of course, mixed usage can also be considered. At this point, it is advisable to deeply understand the capabilities of different models.

• What are the differences between small and large models?
• Which model has the strongest reasoning ability?
• What are the differences between different sizes of the same model, such as 8K and 32K?
• What configurations can run what models if deployed locally? What are their capabilities?
• If using cloud-based large models, what is the billing rate for the model?
• Can different models be used in combination?
• Can corporate privacy data be used with cloud-based large models?

I will explain these questions one by one in subsequent content.

Which Tools to Choose

Lastly, regarding tool selection. Tools are a capability; they can generate an image, search the internet, or even interface with a system.

The capabilities of AI agent development platforms only utilize the capabilities of large models. Therefore, if external system interaction is required, tools must be used. Tools can generally be divided into two categories: those with API interfaces and those without.

Tools with API interfaces are very easy to integrate. Platforms like Coze and Dify have integrated many tools that can be configured and used directly.

For tools without API interfaces, RPA (Robotic Process Automation) is needed to handle them. In simple terms, RPA is an automation tool that can execute a series of operations by controlling the browser.

Prompt Engineering

The third step is prompt engineering, which is the core of the AI agent. Good prompts can significantly enhance the accuracy of the outputs from large models.

A good prompt helps the AI agent accurately understand the task and improves the output quality of the large model.
A good prompt can reduce token consumption and lower costs.
A good prompt helps the AI agent understand the context and ensures the coherence of the conversation.

Therefore, we need to master how to write effective prompts.

• What is the CRISPE framework?
• What is the BROKE framework?
• What is the ICIO framework?
• What is CoT (Chain of Thought)?

We also need to understand the rules for interacting with large models, such as:

• Outputting a long text in multiple parts yields better quality than outputting it all at once.
• Using different symbols to separate different information can enhance the understanding of the large model.
• Providing examples can help the large model quickly understand your requirements.
• For complex tasks, breaking them down into several steps and guiding the large model to execute them step by step yields better results.
• Clearly defining the output content’s constraints, such as word count, format, style, and language difficulty.

ICIO Framework:

• Instruction: Clearly state the specific task you want the AI to perform, such as “Translate a paragraph of text” or “Write a blog post about AI ethics.”
• Context: Provide background information about the task to help the AI understand the context, such as “This text is for an opening speech at an internal company meeting.”
• Input Data: Specify the exact data the AI needs to process, such as “Please translate the following sentence: ‘Artificial intelligence is changing the world.'”
• Output Indicator: Set expectations for the output format and style, such as “Please translate in a formal business English style.”

BROKE Framework:

• Background: For example, “You are writing a press release for a startup tech company about its latest product.”
• Role: Specify the AI as the “press release writer” so it can answer questions from a professional perspective.
• Objectives: Provide a task description, such as “Write an engaging press release highlighting the product’s unique selling points.”
• Key Result: Set key results for the response, such as “Use formal and professional language, including the product’s main features and market positioning.”
• Evolve: After the AI provides a response, suggest three improvement methods, such as “Adjust the language style to appeal to the target audience,” “Add product usage cases,” or “Optimize structure for better readability.”

CRISPE Framework:

1. Capacity and Role: Clearly define the role the AI should play in the interaction, such as educator, translator, or consultant.
2. Insight: Provide background information on the role-playing to help the AI understand its function in a specific context.
3. Statement: Directly state the task the AI needs to perform, ensuring it understands and executes the user’s request.
4. Personality: Set the style and format of the AI’s responses to align with user expectations and contextual needs.
5. Experiment: If needed, ask the AI to provide multiple examples for the user to choose the best response.

Chain-of-Thought (CoT): A method that guides the large model to think through problem-solving step by step like a human.

This mainly includes Few-Shot CoT and Zero-Shot CoT applications.

Few-Shot CoT:

Describe the thinking steps, first understanding customer needs, then considering the , and finally providing recommendations with explanations.

Also provide examples that demonstrate how the AI thinks through the chain of thought to arrive at answers.

Zero-Shot CoT:

Simply add a prompt:

Let’s think step by step.

Database Selection

The fourth step is determining where to store chat logs, collected data, and other content generated during the AI agent’s operation. This is when a database is needed.

For non-technical users, I recommend using Feishu’s multi-dimensional table due to its high visuality, ease of use, and simple integration.

The downside is that as the data volume increases, reading speed may slow down, and it cannot handle complex business logic.

For technical users, common databases like MySQL and NoSQL can be used.

Building the UI Interface

The fifth step is to build your own UI interface. On Coze, you can DIY your own interface, while Dify offers a ready-made interface that cannot be modified.

Both platforms can also publish as service APIs, meaning you can develop an independent interface that connects with them instead of using their provided interface.

If you want to develop your own interface, you can use an AI programming platform like Cursor to customize it.

Another reason to develop your own interface is that on Coze and Dify, you can define multiple AI agents, and you can use your defined interface to call them, allowing you to operate within a single interface.

Testing and Evaluation

The sixth step is testing and evaluation. Testing ensures that your AI agent does not encounter errors, such as program crashes or the large model failing to process user requests.

Evaluation ensures that the responses output by the AI agent are correct. During evaluation, we need to continuously optimize the AI agent to ensure it can provide correct answers and minimize token consumption.

We can use LangSmith to monitor the project’s operation.

LangSmith helps you better utilize large models:

• Debugging and Testing: It can help you identify issues in the program and provide solutions, ensuring the AI agent can correctly respond to questions or complete tasks.
• Evaluation: By creating various test cases, you can assess the AI agent’s performance, such as its accuracy and reliability in answering questions.
• Monitoring: It can observe the AI agent’s operational status in real time, such as request processing speed and costs incurred.
• Logging: It can record all detailed information about the AI agent’s work process, including received questions, provided answers, and parameters used, facilitating analysis and improvement.

Deployment

The seventh step is deployment. Different AI agent development platforms have different deployment methods. Coze can be directly published to Doubao, mini-programs, etc., while Dify can be published as a web application or embedded into your system.

If you have developed your AI agent independently, you can purchase a server for independent deployment.

Conclusion

This is the series of content I will introduce in the collection “Building an AI Agent from Scratch.” I am very happy to delve into these topics with you and help you create your perfect AI agent.

I believe you still have many questions, and we will explore and discuss them together in future content.

If you liked this article, don’t forget to like, bookmark, and follow me so you won’t miss out on more detailed tutorials in the future!

If you have any questions or topics you want to discuss, feel free to leave a message.

Struggling to Choose? Comparison of Three AI Agent Platforms: FastGPT, Dify, Coze

Struggling to Choose? Comparison of Three Browser Automation RPA Tools

Which One to Choose? Revealing the Pros and Cons of Five AI Agent Frameworks, A Must-Read for Beginners!

Understanding: The Differences Between AI Agents, Automated Workflows, and RPA