AI Insights: Data Development and Analysis in 2025

Followinglast week’sthoughts on Type 1 and Type 2 tasks, I reflected on my Type 1 tasks (tasks that can be done better than AI). As a data engineer, what can be done at this stage is to apply AI to various aspects of data development, including: data integration, SQL generation, data quality checks, Gen BI (conversational report generation), data documentation generation, data inquiries, and so on. The goal is to agentify data engineering, creating a Multi-Agent system that can accomplish basic tasks.
From a business implementation perspective, Gen BI is a good entry point. Firstly, there is strong demand; business operations have a strong need for this, hoping to generate data dashboards instantly when needed, rather than waiting for data analysts to schedule. Secondly, from a technical standpoint, there is a considerable supply in the market, with BI tools like Power BI and QuickBI already offering AI-generated analysis and reporting features; in the open-source realm, there are various text2sql models and tools available.
The primary challenge at this stage is accuracy. The phenomenon of model hallucination always exists. How to provide context information to the model and how to construct a manually checked data flow are major concerns. It is still necessary to have experienced business operations and data analysis colleagues involved in the usage chain to review AI-generated SQL code; otherwise, if hallucinated code runs but produces incorrect data for decision-making, it could be fatal.
A possible first step is to provide data colleagues with a copilot for developing data reports, improving their work efficiency while ensuring data quality.
This example reveals the current relationship between humans and AI: AI enhances efficiency; humans provide requirements and validation. We can also consider this in conjunction with other industries.
As AI rapidly evolves, the data technology stack itself is also advancing quickly. This week, I reviewed some trend articles and summarized them here: 【Data Observation】The Evolution of Business Intelligence (BI) and Future. Notably, DuckDB, an embedded OLAP database, can turn any device into a powerful data analysis terminal. It has already become an essential tool in many agent systems and is likely to become the de facto standard for data analysis agents.
This week, I compared two agent systems: Pydantic AI and Phidata. Pydantic AI emphasizes agent operation control based on type validation and a prompt, toolset, and result verification system based on dependency injection, making it quite flexible; Phidata focuses on the abstraction and encapsulation of various components of agent systems, such as model, memory, knowledge, tools, reasoning, team, and workflow. The intuitive feeling is that Pydantic AI is more low-level and flexible, but harder to understand; Phidata is more user-friendly and has a more accessible update process. Next week, I will continue to explore other agent frameworks, such as CrewAI and Dspy, and then select two for comparative testing in data engineering.

Leave a Comment Cancel reply