What Basic Skills Are Needed for a Good AI Phone?

On October 4, 2011, Apple reached an important moment at its headquarters in Cupertino: it was hosting the first major launch event after new CEO Tim Cook took office. Although the much-anticipated iPhone 5 was not released, the iPhone 4S still amazed many media outlets at the time—because of the potential showcased by Siri, which made many believe that this was the ultimate form of a smartphone.

Users were also captivated by the novelty of Siri. Three days after the release of the iPhone 4S, sales reached 4 million units, setting the record for the fastest sales of any iPhone to that number. At that time, someone asked the newly appointed Cook what the “S” in iPhone 4S stood for, and Cook’s answer was: “The S stands for Siri.”

Looking back now, Siri is no longer the representative work of mobile AI; various “artificially dumb” responses have become a source of humor for netizens around the world over the past decade. However, it undoubtedly served as the opportunity for countless people to first encounter the concept of “artificial intelligence,” successfully fulfilling the mission of popularizing the concept of AI: the action of long-pressing a button to summon a voice assistant has become ingrained in people’s minds, becoming a generational memory.

Fast forward twelve years, and although many envisioned usage scenarios have yet to materialize, in 2024 after ChatGPT has disrupted the world, we see that almost all forms of AI on mobile phones still rely heavily on voice as an important interaction medium. Even Apple, which initially led this trend, has chosen at this moment to abandon its “Titan Project” for car manufacturing and reinvest fully in self-developed large models.

What Basic Skills Are Needed for a Good AI Phone?

However, unlike other smartphone manufacturers, Apple has chosen to bet on edge-side large models this time. Google also specifically mentioned the integration capabilities with the Android system in its Gemini release last year; developers can more easily call on Gemini’s edge-side AI capabilities, enabling more third-party app adaptations.

In this regard, OPPO, Apple, and Google’s latest strategies have reached a consensus to some extent: in last year’s launch of the Andes large model, AndesGPT emphasized the edge-side reasoning optimization capabilities, highlighting the existence of computational power on smartphones in the fields of call summaries and text-to-image generation.

For ordinary smartphone users, paying for a better experience rather than a gimmick has always been the core value of consumer electronics. Even if the market is generally optimistic, the riddle of making AI phones truly practical still needs to be answered by smartphone manufacturers.

A Better Smartphone and System

Since the release of ChatGPT, within less than a year, mobile AI has become the “Holy Grail” pursued by smartphone manufacturers: almost all mainstream phone manufacturers have started to rush in, but eager manufacturers quickly discovered that what AI phones need to do is far more than just directly adding a large model interface.

According to the vision held by Steve Jobs when he acquired Siri: in the future, voice assistants would help iPhone users complete various daily tasks, from setting alarms to placing takeout orders, hailing rides, and planning schedules. These would all be standard features of the next generation of smartphones.

These visions are still relevant today: the reason users feel excited about the concept of AI phones is similar to the time when “Jobs” envisioned that “smartphones finally have the opportunity to become smarter.” For photography enthusiasts, AI phones would be a new camera that is more user-friendly than current models, easily satisfying various stylistic and content generation needs; for office workers, AI phones mean that more tasks that previously required a “professional team” can now be handled by just one phone.

These may sound a bit sci-fi, but the energy contained within is already enough to spark the next revolution in smartphones—just like during the first iPhone launch event, the audience only realized after Jobs repeated it three times that he was about to unveil a magical device that could solve the needs of an iPod, a phone, and a browser all at once.

By 2024, OPPO’s founder Chen Mingyong also stated, “the era of AI phones will become the third phase of the mobile industry following feature phones and smartphones.” This judgment is based on the potential that AI has already demonstrated in breaking down app boundaries. Some scenarios that previously required opening office software, WeChat, Word documents, and browsers simultaneously can now be accomplished with just a few sentences of dialogue.

However, these envisioned scenarios present a significant challenge for generative AI services like ChatGPT, which can only rely on apps within smartphones, as they cannot touch the underlying capabilities of the mobile system.

Behind functions like photo editing, document generation, real-time translation of call content, and summary generation, all require deep support from the mobile system.

For example, the past year’s hottest “out-of-the-box” case in the AIGC field is undoubtedly the AI photo editing that became popular on Xiaohongshu: including the AI-generated portrait photos integrated into ColorOS 14, or quickly removing pedestrians from the background of photos—these very practical functions have provided many users with their first experience of “AI shock” in a near-dimensional reduction manner, quickly becoming “from novelty to regular use” AI application scenarios. According to OPPO’s data, the updated AI removal function in OPPO Photos has already reached an average usage frequency of 15 times per user per day.

Although these application scenarios have already changed lives, these operations are still limited to the integration of AI functions within applications. However, our vision for AI phones clearly goes beyond this. What if AI phones could accurately find the photo you want from a massive collection with just one sentence? Or could automatically process work documents and other more complex tasks?

These questions actually bring the challenges back to the aspects of mobile operating systems and hardware design: it seems that only manufacturers who have accumulated enough experience in hardware manufacturing and mobile operating system design during the smartphone era can discern the actual needs of users; under sufficiently simple and intelligent operations, the integration of complex AI capabilities and cloud computing power is indispensable.

Referring to the history of how smartphones gradually replaced feature phones, the simplest answer for AI phones to truly replace smartphones is: start from existing functionalities.

According to OPPO’s definition of AI phones, for AI phones to provide a better user experience than current phones, they should possess the following four characteristics:

Efficiently utilize computational resources to meet the needs of generative AI;
Be able to sensitively perceive the real world and understand complex information about users and their environment;
Have strong self-learning capabilities;
Possess abundant creative capabilities to provide users with inspiration and knowledge support.

Within this framework, merely adding AI function integration to traditional app architecture is not enough: to improve usability in AI phones, it is necessary to further streamline the phone’s interaction architecture to provide users with a more convenient AI functionality experience. AI within smartphones can only become smarter, better understanding your daily life, and making flexible arrangements based on this.

Even with existing functions, to truly achieve usability and “plug-and-play” capability, deeper integration with the phone’s AI capabilities is needed, as this is the core competitive advantage of AI phones in the market.

The Basics of AI Phones

Currently, the biggest bottleneck in the development of AI phones is still the high cost of cloud computing: as a generative dialogue robot that operates entirely on the cloud, OpenAI’s daily server and bandwidth costs for running ChatGPT exceed $100,000, and the monthly operational costs are enough to train GPT-3 two to three times.

Even with high cloud operational costs, there are still many trivial scenarios in daily smartphone use that require AI intervention to improve efficiency; in this contradiction, both Apple and OPPO have chosen edge-side large models, which seem to have become the “optimal solution” for AI phones.

However, to achieve better AI capabilities on the edge, smartphone manufacturers still need to work closely with hardware chip manufacturers, and even need to have certain chip design capabilities themselves to customize chips based on the operational conditions of large models, to meet computational power demands as much as possible while improving the operational efficiency of large models: this is the strategy OPPO has adopted for its Find X7 series in preparation for AI. The OPPO Find X7 is already the world’s first smartphone equipped with an edge-side language model with 7 billion parameters.

Some may question: “Hasn’t the theory of smartphone performance surplus been raised for many years? Why does the current smartphone still lack sufficient computational power to run edge-side AI large models?”

This actually falls into the misunderstanding of the evolution of hardware development and user needs: today’s smartphones have achieved performance several times greater than computers from a decade ago, but user expectations for smartphones have also risen significantly. Even for small tasks like image cropping, as long as cloud processing is involved, the bandwidth and server costs are still difficult to apply on a large scale.

Currently, OPPO has achieved millisecond-level response times when processing related application scenarios based on optimizations from hardware to software, while also being able to finely adjust image extraction choices based on user operations; the AI call summary feature newly added in ColorOS 14 relies on the deployment of edge-side large models to accurately identify current call content and generate summaries without affecting normal phone call usage. The 100+ AI capabilities of the Xiaobu assistant are just a step away from the “complete Siri” envisioned by the outside world.

However, the application of edge-side AI is also full of challenges: adapting to different architectures and performance SoC chips is also testing the AI optimization capabilities of smartphone brands, which is also the most important basic skill of AI phones.

Currently, OPPO has already completed the adaptation work for both Qualcomm and MediaTek platforms, and is also integrating research and development resources by establishing AI centers to strengthen AI capability building and research and development. By gathering the entire company’s strength, OPPO views artificial intelligence as the most important strategy for the next era of smartphones, and is willing to invest unlimited resources in this area, aiming to create a more suitable technological environment for the growth of AI phones.

At present, the AI functions that OPPO has introduced have successfully sparked users’ interest in trying them out. OPPO’s long-term accumulation in the field of AI has finally allowed it to take the lead over competitors in the next revolution of smartphones, using various AI tools that are easy for users to adopt and hard to part with to establish stronger user loyalty.

We may still be far from the “next generation smartphone” envisioned by Jobs, but with the continuous emergence of features like edge-cloud combined large models and OPPO AI removal, AI phones will gradually integrate with stronger hardware, providing more precise user services and penetrating deeper into users’ daily lives.

It can be anticipated that in the next 1-2 years, AI phones will penetrate more rapidly into areas including photography, document/image/audio multimodal content processing, and voice assistance, continuously enhancing these capabilities.

In this process, smartphones will always be the most important carrier, and the next generation of AI phones will be able to generate various images/videos or complex tables/PPT content using more powerful models and computational power, transforming smartphones from mere tools into Iron Man’s “J.A.R.V.I.S.”.

This revolution that we have been waiting for 12 years is finally unfolding.

-Produced by Guoke Business Technology Communication Department-

-Advertisement-

Leave a Comment Cancel reply