In recent years, the emergence of AI (Artificial Intelligence) large model technology has sparked a new wave of AI research. Now, AI agents are becoming a new hotspot in the industry. From voice assistants in smart cars to digital human hosts in live streams, AI agents are deeply transforming the application ecosystem with their unique autonomy and interactivity, continuously building a new vision for smart living.
Redefining Human-Machine Interaction
As the name suggests, an AI agent is an intelligent entity equipped with AI capabilities, which can be either a hardware device or a software system. It can perceive the environment, make decisions, and execute actions based on AI capabilities to achieve specific goals.
“In simple terms, an AI agent is like a ‘little assistant’ that has IQ and EQ, understands, and can help.” Chen Hao, Deputy Director of the Advanced Technology Center at the Beijing General Artificial Intelligence Research Institute, states that this “little assistant” not only understands human language but also continuously improves its skill levels in specific areas through learning and data analysis.
Why has the AI agent become a focus of industry attention? What is its relationship with large model technology?
A relevant person in charge of ByteDance’s Doubao large model stated in an interview that AI agents are based on large model technology. AI agents “have hands and feet,” capable of working and executing tasks on their own, with the large model serving as their “brain.”
However, AI agents are a more “three-dimensional” intelligent system. In addition to providing language communication services widely used by large models, AI agents can also perform intelligent reasoning and emotional analysis based on context and mimic human behavior for corresponding actions.
For example, when given the task command “Help me cook a dish,” a “large model chef” can only output a recipe and list the necessary ingredients; an “AI agent chef,” on the other hand, can not only provide the recipe but also select the most suitable ingredients for automatic ordering based on the user’s taste preferences and nutritional needs. It can even monitor the cooking process to ensure the quality and taste of the food.
“Traditional human-computer dialogue is often limited by fixed patterns and preset rules, making it difficult to achieve truly natural communication.” Liang Zhixiang, Senior Vice President of Baidu Group, pointed out that leveraging the four capabilities of large models in understanding, generation, logic, and memory, AI agents can simulate a dialogue style that is much closer to real human conversations, making “human-computer interaction” as smooth and natural as “person-to-person dialogue.”
In fact, thanks to the universality and scalability of large models, the threshold for using AI agents has been significantly lowered. Whether large enterprises, small and medium-sized enterprises, or even individual developers, they can quickly build their own AI agent applications without new hardware or a large amount of additional training data.
Recently, Baidu’s “Wen Xiaoyan” large model app launched a new feature allowing users to create AI agents with just one sentence. Everyone can create their own AI agent according to their needs, with personality, voice, and identity settings depending on the user’s personalized choices. Creators can engage in video conversations with their “exclusive agent,” practice spoken English, and even simulate job interviews. According to related statistics, Baidu’s Wenxin AI agent platform has attracted 100,000 enterprises and 600,000 developers, covering hundreds of application scenarios.
“In the future, if users can more conveniently use and generate their own AI agents, this will truly unleash the value of AI agents,” Liang Zhixiang said. “Next, we will accurately and efficiently distribute AI agents to more users, allowing everyone to become a ‘developer’ of AI agents.”
Continuously Expanding Application Scenarios
Currently, a series of AI agent technologies are thriving, and application scenarios are continuously expanding.
“A year and a half ago, BAIC ARCFOX began researching AI agents, primarily applied in improving R&D efficiency, standardized language compilation, and user services, among other areas.” Feng Shuo, Director of the Intelligent Connected Center at BAIC Research Institute, introduced that AI-enabled cockpits have bid farewell to the old model of mechanical, fixed command “human-machine Q&A,” achieving flexible and customized “intelligent interaction.” For example, AI agents can arrange schedules based on the work habits of passengers, capturing their preferences and emotions to recommend music, movies, etc.
When detecting that a passenger is overly fatigued, the AI agent can quickly generate a service plan that includes reserving a parking space, adjusting the in-car environment, and setting rest durations to provide users with a safer smart driving experience. “In the future, AI agents are expected to include functions like ‘one-sentence food ordering,’ making it more convenient for passengers to use,” Feng Shuo said.
Meanwhile, AI agent technology is also rapidly developing and gradually being implemented in various small terminal devices.
“Doubao Doubao, who is this Arhat in the temple?” “This is Mahakasyapa, one of the ten disciples of Shakyamuni…” Self-media operator Xiao Fan recalled that during the National Day holiday visit to the Guoqing Temple in Taizhou, Zhejiang, he often had such Q&A interactions with the Ola Friend headset.
It is understood that Ola Friend is the first AI agent headset launched by ByteDance’s Doubao large model, which, in addition to regular sound playback functions, can also provide instant assistance in information queries and travel scenarios.
A relevant person in charge of Doubao large model stated that Ola Friend can transform into the user’s “personal tour guide” at any time, and users can ask follow-up questions based on points of interest. For example, when visiting an art exhibition, users can ask Ola Friend to introduce a specific exhibit and further inquire about the creator’s artistic style and other representative works, gaining more knowledge through a Q&A format.
Since the beginning of this year, more and more mobile phone manufacturers have joined the AI agent layout. Vivo recently launched a smartphone AI agent named PhoneGPT, capable of accurately operating mobile applications based on user intentions to complete tasks such as making calls, sending messages, or booking restaurants, greatly enhancing user experience. Huawei upgraded its smart assistant Xiaoyi to a system-level AI agent, not only advancing its Q&A capabilities but also enhancing its reasoning abilities. OPPO launched the “1+N” AI agent ecological strategy, composed of AI super agents and AI Pro development platforms, aiming to provide a more personalized service model that aligns with user preferences.
In commercial service scenarios, AI agents are deeply interacting with consumers.
Baidu’s e-commerce digital human live streaming platform “Hui Bo Xing” can generate a sales AI agent in just five minutes, which can be online 24 hours a day, and the entire live streaming room is fully automated and unmanned. Digital human hosts and digital human assistants each perform their roles, answering consumer questions in a timely manner, demonstrating and explaining products in a smooth and natural manner, and for questions that cannot be answered verbally in time, there is an AI assistant providing text responses.
“Thanks to the digital human live streaming AI agent technology, the e-commerce live streaming industry has effectively alleviated issues such as high costs, time constraints, and unstable quality.” Liang Zhixiang stated that so far, “Hui Bo Xing” has helped tens of thousands of merchants achieve revenue growth, averaging a 62% increase in total merchandise transaction volume.
Currently, AI agents are also being applied in various other scenarios, such as programming, content creation, and industrial manufacturing, demonstrating strong application potential and market value.
Bringing More Possibilities to Future Life
Many industry insiders believe that AI agents will be a future trend.
Tencent’s “2024 Digital Technology Frontier Application Trend Report” states that large models will move towards multi-modality, and AI agents are expected to become the next generation platform. The international management consulting firm Accenture, in its “Technology Outlook 2024” report, stated that 96% of business executives believe AI agents will bring significant development opportunities to their companies within the next three years.
Industry insiders indicate that, in the foreseeable future, AI agents will help multiple industries establish a new normal of intelligent operations centered around “human + AI digital employees.” For instance, in the medical field, AI agents can assist doctors with diagnosis, treatment, and health management; in the transportation sector, AI agents can provide scientific support for traffic management and planning by analyzing data and real-time road conditions; in education, AI agents can offer intelligent tutoring and adaptive learning systems to help students better grasp knowledge.
Experts point out that as machine learning and deep learning technologies continue to advance, the characteristics and learning capabilities of AI agents will become even more powerful, better adapting to the complex and ever-changing real world, bringing more possibilities for social development.
Although AI agent technology has brought more possibilities for future life, it is still in its infancy—existing AI agents can only perform relatively simple, fixed tasks, and their application functions are severely homogenized.
Some viewpoints suggest that one of the bottlenecks in the development of AI agents is that current large models lack sufficient reasoning capabilities, making it impossible to solve complex problems without human intervention. The large model technology itself has inherent defects due to algorithms and other factors, which can lead to a series of security risks for AI agents.
In addition to technical risks, AI agents also face ethical and privacy issues. Industry insiders state that AI agents collect a large amount of data while providing services, which may lead to the leakage of personal privacy information. For example, AI agents may infer certain personal preferences based on users’ shopping habits. This kind of “spying” behavior is undoubtedly an invasion of user privacy.
Experts believe that it is necessary to quickly categorize and manage AI agents based on their functional purposes and usage duration, especially to continuously supervise the development, production, and application deployment of high-risk AI agents, and to promptly formulate relevant laws and regulations to improve existing internet standards, thus better preventing various risks posed by AI agents.
