AI Tool Experience: Zhipu AutoGLM Impresses with Web Version

AI Tool Experience: Zhipu AutoGLM Impresses with Web Version

After Zhi Xiaobao and Honor, has Zhipu AutoGLM truly taken the first step out of ACT?

To free everyone’s hands, mobile manufacturers, app developers, and large model companies are all striving.

On September 5, Zhi Xiaobao, which claims to enable booking tickets, ordering food, and hailing rides through conversation, made its debut. The next day, at the 2024 Consumer Electronics Show in Berlin, Honor CEO Zhao Ming announced the industry’s first open ecological intelligent agent, which has already been integrated into the Magic 7 for consumers to experience.

However, after an in-depth experience, Zhi Xiaobao’s lack of completeness in scenarios, poor functionality, and immature technology have greatly discouraged ordinary users. The AI Agent offered by Honor also failed to attract users enough to switch phone brands.

Nevertheless, while Zhi Xiaobao and Honor couldn’t provide useful features and broad coverage, the AI company Zhipu officially launched an AI tool capable of replacing humans in operating phones and computers at the end of October. By the end of November, they released an update to the AI Agent series, including the mobile-focused AutoGLM and the PC-oriented self-agent — GLM-PC.

After a week of in-depth experience, I can confidently say that AutoGLM provides certain conveniences in specific scenarios, while GLM-PC can indeed serve as a helper in work to a certain extent.

1

Although AutoGLM is not mature

it has taken a crucial step forward

AutoGLM’s slogan is “AI, More Than Just Conversation,” and it indeed transforms tasks that require human action into tasks that only need verbal commands.

Do you remember the impressive yet disappointing line about ordering coffee at the Zhi Xiaobao launch event? At that time, only a few cities and brands supported this action, while AutoGLM imposes no such restrictions on categories, brands, or regions.

As a loyal user, I can finally complete the task of ordering milk tea with just a sentence.

As shown in the video below, I only need to tell AutoGLM which brand and type of milk tea I want, along with the ice and sweetness levels, and then wait for the payment interface to place the order. Of course, if you’re unsure about what to drink, you can let it enter Meituan’s page and then use voice control to make a selection. If you find repeated conversations cumbersome and have no specific requirements, the “random mode” can give you a surprise. If you already have a drink in mind, just give the command and make the payment, and you can wait for the delivery person to bring it to your door.

Besides ordering milk tea, buying items on Taobao is also a breeze. Unfortunately, the 19L mineral water I wanted is out of stock, but AutoGLM doesn’t get stuck and instead informs me via voice that it’s sold out.

In more complex scenarios like travel bookings, AutoGLM performs well. After issuing a command, it will very “human-like” verify the departure location, destination, travel time, and type of transportation to ensure accuracy.

In all these scenarios involving payment or privacy, there will be a prompt stating, “This step involves important operations, do you wish to continue?” Before entering the final payment password or fingerprint verification, personal verification is still required to ensure the safety of personal information.

If the above operations only involve single program actions with little difficulty, then the next perfect implementation of cross-APP commands by AutoGLM truly opened my eyes.

For example, we issued the command “Search for the recipe of braised pork on Xiaohongshu and forward it to my WeChat file transfer assistant,” and AutoGLM indeed fulfilled my request very well.

In addition to the above operations, AutoGLM can also help you send WeChat messages, summarize public account articles, like and comment on Moments, find stores, write reviews, follow users on certain platforms, or like specific articles, and check nearby situations. The currently available applications have covered WeChat, Xiaohongshu, Douyin, Weibo, Amap, Taobao, 12306, Ctrip, and other commonly used apps.

AI Tool Experience: Zhipu AutoGLM Impresses with Web Version

However, after discussing the advantages of AutoGLM over past products, we must also address the unpleasant points encountered during actual use.

The most direct issue is the response speed; many demonstration videos choose to speed up processing to save time, but in real experiences, the intervals between each step exceed two seconds. Completing a lengthy action takes a considerable amount of time; additionally, some pop-ups or inexplicable errors interrupt its actions. Furthermore, for cross-application commands, AutoGLM defaults to the highest-ranked rather than the most suitable product or answer, and adding dialogue in such cases becomes a significant workload. Lastly, AutoGLM’s voice recognition is indeed challenging, and if you want to search for a well-known public account like 36Kr, you can likely declare failure.

Therefore, currently, AutoGLM resembles a prototype of a future Jarvis, and the high-completion actions have undoubtedly undergone specific training. To achieve more scenarios and better experiences, we will have to see how Zhipu plans for the future. If relying solely on training, the Zhipu team’s capacity, energy, and cost to train so many scenarios are questionable.

Moreover, due to the technical implementation supposedly relying on a multimodal model to recognize the phone screen and another model to manipulate the phone, AutoGLM is limited to Android devices. How to enable Apple users to access a better AutoGLM in the future remains a challenge.

In summary, AutoGLM has surprised us in many ways, but to catch up with human-like capabilities, it still requires a long time. From my perspective, if AutoGLM can refine several mainstream application scenarios through specialized training methods to provide an excellent user experience, it would already be a significant success.

2

How Much Can AutoGLM Web Enhance Productivity?

If AutoGLM is the most eye-catching product from Zhipu but doesn’t significantly enhance practical efficiency, then the AutoGLM Web released on the same day truly impressed me.

AutoGLM Web is essentially a plugin launched by Qingyan for the Chrome browser. After installation, it exists as a sidebar on the right, easily accessible or closable as needed.

AI Tool Experience: Zhipu AutoGLM Impresses with Web Version

In terms of functionality, AutoGLM Web is divided into general mode and advanced mode.

General mode is quite simple, mainly offering page summarization and webpage translation.Webpage translation is self-explanatory; my personal experience is slightly better than the built-in translation function of the browser, but it doesn’t stand out. The biggest convenience of page summarization is that it doesn’t require opening an article to feed it to the large model for summarization; it reduces a lot of unnecessary work, allowing workers focused on tasks, learning, or creation to work without any distractions.

AI Tool Experience: Zhipu AutoGLM Impresses with Web Version

Another great point is that when asking questions after opening a page, AutoGLM Web will answer based solely on that article (page), whereas many other large model products tend to stray off-topic and fill in a bunch of irrelevant garbage information. That worry is eliminated.

AI Tool Experience: Zhipu AutoGLM Impresses with Web Version

But what if our information sources do not come from a single article and require outputs from multiple documents? Don’t worry,in advanced mode, AutoGLM Web provides the ability to summarize multiple links,without needing to open each link; just select them to get a summary and engage in relevant dialogue.

For example, when investors look at numerous research reports, opening Dongfang Caifu and using AutoGLM Web’s multi-link summarization function allows me to select target reports without opening them and request a summary. The summary will also indicate the source, and you can discuss related topics, ensuring that the information you receive is sourced from the target pool. This “unpolluted” feeling is truly fantastic.

AI Tool Experience: Zhipu AutoGLM Impresses with Web Version

If you don’t have a specific article or source but have environmental requirements (after all, there’s a growing sense of fragmentation among platforms like Xiaohongshu, Zhihu, and CNKI), you can useAutoGLM Web’s internal advanced search,which supports Xiaohongshu, CNKI, Weipu, Baidu Scholar, Zhihu, arXiv, Baidu Search, and more, with the specific usage method as follows.

Honestly, after seeing the entire workflow of AutoGLM Web, I was genuinely surprised. Its “human-like” behavior is truly impressive, and since the default hot posts are generally useful, the summaries provided might be somewhat lacking but are not erroneous.

This is still the answer obtained under the source of Xiaohongshu video graphics, and after personal testing, the summarization effect in text-based academic sites like CNKI and Weipu, as well as the travel destination recommendations in Xiaohongshu, have all been quite good.

Additionally,the advanced mode also provides AutoGLM’s functionality,which can also complete tasks through verbal commands, and its ability to perform multi-step operations on commonly used websites is fairly good. Users can also control their PCs remotely via mobile to complete tasks automatically or set scheduled tasks to execute at a future time while the PC is on, which is a novel feature, but for office workers using PCs, it might feel somewhat redundant.

Overall, AutoGLM Web’s functionality is quite powerful, and more importantly, it specifically addresses problems that other tools cannot solve. It has already become a great helper in my work and is now fully open for direct installation and use as a plugin.

END

Leave a Comment