π OpenAI’s latest AI agent “Operator” tested: It can accomplish complex tasks, but it’s still far from “replacing humans”!
“AI agents are about to change the world!” β This is a headline I see daily on LinkedIn. As an AI enthusiast, I’ve used almost all mainstream large language models (LLMs), and I even spend $200 a month subscribing to ChatGPT Pro. However, I have always been skeptical about the hype surrounding AI agents.
Today, OpenAI announced the launch of a brand new AI agent “Operator”, specifically designed for ChatGPT Pro users. As an AI fanatic, I couldn’t wait to test it. What were the results? Let me tell you.
π€ What is Operator?
Operator is a brand new AI agent launched by OpenAI. Unlike most agents that rely on external APIs, Operator operates completely autonomously and can perform tasks through the browser. It is based on a new model called Computer-Using Agent (CUA), which combines the visual capabilities of GPT-4o and can interact with graphical user interfaces (GUIs).
In simple terms, you just need to give it a goal, and Operator will automatically open the browser, search the web, and accomplish the task. Sounds cool, right?
π Testing: How Does Operator Perform?
To test Operator’s capabilities, I assigned it a simple task: “Collect information on 50 popular financial influencers from YouTube, obtain their LinkedIn information and emails, and organize it into a table.”
1. Initial Performance: Impressive
Operator opened the browser, used Bing to search for financial influencers, and began gathering information. In the first 5 minutes, its performance amazed me β it was genuinely completing tasks autonomously!
2. Issues Arise: Hallucinations and Inefficiency
However, after 10 minutes, problems began to surface:
β Hallucination Issue: Operator started to “fabricate” information. The LinkedIn links and emails it provided were mostly fictitious, with no verification of authenticity.
β Inefficiency: Each click and scroll took 1-2 seconds, and the overall speed was as slow as “swimming in syrup”.
β Lack of Flexibility: When faced with platforms requiring login (like Google Sheets), Operator did not proactively request help but wasted time searching for alternatives.
Ultimately, after 20 minutes of effort, Operator only managed to gather information on 18 influencers, and most of the data was incorrect.
π‘ The Potential and Limitations of Operator
Potential
β Autonomy: Operator can operate the browser completely autonomously, demonstrating the future potential of AI agents.
β Multitasking: It can handle multiple tasks simultaneously, such as searching, organizing data, and generating tables.
Limitations
β Hallucination Issue: Operator’s “fabrication” behavior makes it impossible to fully trust its outputs.
β Slow Speed: The operational delays are severe, and efficiency is far below that of humans.
π¨ Conclusion: AI Agents Are Still Far from βReplacing Humansβ
Although Operator demonstrates the potential of AI agents, it is still far from being able to “replace humans”. Its slow speed, high error rate, and lack of flexibility make it less than ideal. For AI enthusiasts like me, Operator is an interesting toy, but it is not yet a productivity tool.
Future AI agents may be smarter in requesting user input to handle complex tasks.