The Path to AGI: From Alchemy to Energy Equation

“Years later, facing the firing squad, Colonel Aureliano Buendía will remember that distant afternoon when his father took him to see ice blocks.” “One Hundred Years of Solitude” opens with a “future past tense” that sets the stage for the century-long fate of the Buendía family. José Buendía leads his kin away from their hometown to create the utopia of “Macondo,” exploring the new world inspired by Melquíades, diving into alchemy in solitude, seeking to unravel the mysteries of nature but instead stepping into a magical reality.

Alchemy is “dangerous”; historically, it has brought many disasters. It is intertwined with the fantasy of “immortality,” a mad dream of kings and sorcerers, marking the beginning of humanity’s quest to unveil the laws of nature. Modern classical physics was founded by Newton, and we began to scientifically understand natural laws. The great Newton led humanity from crawling to standing eye-level with nature, a moment when humanity truly stood up. Yet, even in his later years, Newton was obsessed with alchemy, a romanticism inherent in human nature that always yearns for magic. The true “alchemy” moment for humanity was not the glimmer of gold but the explosion of particles, Einstein’s energy equation, E=MC², unveiling the mysteries of the universe, allowing us to create maximum energy at the smallest scales, shaking the entire globe. It wasn’t until a decade ago that Bitcoin was born, with Satoshi Nakamoto inventing the “new gold,” where technology and fantasy coexisted, and the path of mining alchemy finally realized the “thousand-year wish” of some, revealing that the magic is not Bitcoin, but supercomputing power, opening Pandora’s box from then on.

Years later, facing interstellar AI robots, I will remember that distant afternoon when my companions took me to Silicon Valley to witness the birth of large models.

At the beginning of 2023, AI chatbots emerged unexpectedly, and artificial intelligence arrived in the world in a way that surprised everyone. This mysterious process is referred to as emergence. Everyone typed cautiously on their keyboards, deeply captivated by GPT-3’s chat responses, and turned text into images in Midjourney, becoming artists for ten minutes. People shared the news, feeling for a moment that they had become alchemists, discovering a new “alchemy” of humanity. From that moment, the narrative of the world revolved around AI, with unprecedented influxes of capital, talent, data, and computing power. Across the globe, a massive “gold rush” began, as people navigated through the pandemic to welcome some “bright” moments.

The LLM large model of 2023 is the modern “alchemy recipe,” with OpenAI leading the way, becoming the crown and filter of the new world. Through this filter, we scrutinize every tech giant: will Google be overthrown? Can Apple keep up? Is Meta’s strategy correct? At that time, multimodal technology was still immature, data reserves were abundant, scaling laws were in their prime, NVIDIA GPUs were hard to come by, Moore’s Law was still valid, and GPT-4 was on the horizon… this AI technology flywheel seems inexhaustible, and the world is caught in a frenzy of “alchemy.” Cheers!

Two years later, AI remains on a pedestal, but OpenAI has stepped down from its throne. The large model companies are in healthy competition, and their capabilities are nearing each other, revealing that data and computing power are the true bottlenecks. This is the fastest consensus among humanity and the most heavily invested technological wave. We ponder whether AI’s significance to us is a pivotal moment like the discovery of fire by ancient apes, an industrial revolution-level productivity cycle, or merely an internet-level commercialization cycle. Optimists await the arrival of AGI, while pessimists mock the possibility that GPT-5 may never come.

Returning to Silicon Valley at the beginning of the year, I deeply felt that AI had entered a “major infrastructure cycle,” where the scale of this infrastructure is “world productivity.” Einstein’s energy equation failed to liberate humanity’s energy exploration of the stars, but AGI may completely free humanity from productivity constraints. I wrote an article titled “AI is Labor, We’re Consumers,” sharing my thoughts on the wave of productivity. Here, I will detail the technical characteristics and trends of this cycle:

1. Large models provide the underlying energy compiler, serving as an entropy reduction force at the level of Earth civilization. The capillary pathways to AGI productivity are Agent + Robotics.

Two years ago, we thought there would be architectures surpassing Transform, but today, mainstream large models are still trained within this architecture. Friends at OpenAI shared the growth ranking of large models: data > computing power > algorithms; data and computing power are the more certain paths to growth models. My understanding is that data represents the comprehension of the order of the civilized world, while computing power reflects the efficiency of converting cosmic energy. Large models provide humanity with an entropy reduction force that surpasses biological intelligence, simulating and generating the order formed by humanity, allowing us to decouple human production and replace it with more efficient AGI.

We shape our tools, and therefore our tools will shape us. In digital scenarios, we have created “human + SaaS,” processing bits. In physics scenarios, we invented “human + tools,” moving atoms. Decoupling the relationship between humans and tools is complex, allowing AI to utilize already established productivity tools instead of redesigning from scratch. This is the core significance of Agent intelligence, meaning we want to let AI enter existing workspaces instead of directly opening new ones. In the physical world, besides Agents, we also need embodied robotics to enter productivity scenarios centered around human hands.

In the past, our productivity unit was human heads (Head). Hiring heads and subscribing to SaaS were based on an economic model that allowed contracts to be signed or terminated but required prepayment. Managing heads and utilizing SaaS created different cost effects and organizational models. The large model’s continuation of the SaaS subscription business model has been less successful; OpenAI loses $200 per subscription user. More suitable for the intelligent era will be billing methods based on tasks and results, such as Cost per Action and Cost per Agent, which will make productivity costs granular to actions and responsible for results, minimizing transaction friction and significantly enhancing societal efficiency. Recently, the training costs of the Deepseek R1 model have dramatically decreased, shocking the world. We are about to enter an era where tokens are as cheap as utilities, which has profound implications for the economic value of Agents, allowing them to enter more industry applications.

Agent intelligence operates based on productivity goals and task scheduling using large models and human toolboxes to complete productivity replacements in various industries. This process forms the cycle of capillary formation, transitioning from mainline (LLM) to branch infrastructure cycles. 2025 will be a year of rapid innovation and release for Agents, as the Operator Agent recently released by OpenAI can now complete tasks based on the web without relying on APIs, meaning it can use SaaS to accomplish the daily work of traditional white-collar jobs. This is exciting, as the market value of the US white-collar sector is worth trillions of dollars, and we are waiting for the first road to “open for business!”

2. Entering the physical world requires more new data; embodied robotics are the super AI hardware, while AI hardware is merely its data sensors.

LLM large language models are trained on Internet data, representing the accumulation of decades of human use of the Internet and content creation. As datasets created and structured by humans, large language models exhibit high versatility, with words being their basic units. However, the diversity and dimensions of the physical world far exceed this; large language models are not adept at understanding the physical world. Having “embodiment” is key for physical world models, as the fault tolerance of physical world models is significantly lower than that of language models, and the cost of making mistakes in the real world can be enormous.

The physical world is built around the “human hand,” and embodied robotics represent the ultimate form of AI entering the physical world (homes, communities, factories); they are super AI hardware. The “dexterous hands” surrounding productivity are critical components, representing a physical form of Agents, where the upper body of the embodied robot is more important than the lower body. In the future, in new fully robotic productivity environments, dexterous hands may not necessarily be key, depending on the optimal productivity form of the robot. Generalizing the physical world is challenging due to cultural, racial, and environmental differences; embodied robotics will not evolve towards generality, but will be based on “tasks + scenarios” in the future.

Building entirely new datasets around the physical world is an ambitious mission we call the Large World Model. On one hand, we need to accelerate the total amount of world data structured around “human-centered” approaches, creating always-on devices; on the other hand, we need to broaden the dimensionality of physical world data, such as biological touch, environmental, and mechanical data.

Stanford’s Robotics Lab, founded by Fei-Fei Li, released the Behavior-1K dataset, breaking down household task environments into training datasets. Tesla’s FSD is based on driving task datasets, continuously collecting data over time and receiving ongoing positive and negative feedback. However, these datasets are still far from sufficient, and the need for labeling data for the physical world, termed “ScaleAI,” is imminent. I believe a feasible path is to accelerate the emergence of AI hardware/wearable products, increasing the data sensors of AI hardware, allowing each AI hardware to act as a labeling node for physical world data, collecting and structuring data into scenario task datasets for training the Large World Model. Connecting hardware/wearable products with large model capabilities will create numerous entrepreneurial opportunities, as the US infrastructure cycle focuses solely on super and ultimate AI hardware—embodied robots. This presents a unique AI infrastructure opportunity for the Chinese industry, with a number of startups already emerging in the Greater Bay Area of Shenzhen.

The primary productivity in the physical world is provided by blue-collar workers. Creating AI wearable products around blue-collar industries enhances productivity while collecting target data. By training vertical models through specific task scenarios, we can help embodied robots transition into these roles through remote operation. The proposed industrial return in the US aims to resolve productivity and production costs through “embodied robots + blue-collar remote operation”; Tesla plans to produce 5,000 Optimus robots this year, with the first scenario being to assist SpaceX in launching rockets, overcoming human environmental limits and increasing launch frequencies. Unitree has suddenly gained immense popularity in the US, with universities and startups exploring physical world task scenarios using embodied robot platforms. Like mushrooms after rain, embodied robots are entering the physical world, unlocking industry after industry.

One hundred years ago, we sent off the last emperor; today, we are about to witness the end of blue-collar labor.

3. We become AI super consumers, as humans are restructured by data, establishing new consumerism and lifestyles.

90% of the world’s data has been generated in the past decade. In the TMT era, as we consume apps on our phones, we are also being digitized by each “app square.” Cameras are structuring our dining, entertainment, and leisure activities. TikTok and Xiaohongshu are structuring our creations, while Notion structures our knowledge data… Even though we are so addicted to our phones, the average daily usage time is only 6-8 hours, with phones locked most of the time. This means there are 16 hours of data daily that go unrecorded and uncollected, data that vanishes in an instant.

It has become a new consensus to permeate the second half of human time: enabling AI to access and collect the remaining 16 hours of data (Datalize Human), with Always On Devices being highly anticipated. AI glasses have become the hottest category in the past year, bridging the locked screen time of phones, integrating visual and auditory capabilities, allowing interaction with AI at any time. More importantly, they possess the same versatility as smartphones, potentially becoming the universal AI entry point for everyone. Major companies and startups are diving into this space, as the race begins and the battle of the thousand mirrors commences; AI toys have become the highlight of this year’s CES, opening up data scenarios for “marginal” groups, including growing children, lonely women, and isolated elderly individuals, where companionship and emotional value can be monetized; AI rings and bracelets have suddenly broken through, as the value of wearable health is amplified by AI, with Whoop and Ouraring surpassing a million units shipped annually, while Chinese alternatives (RingConn) have also emerged. For carbon-based life forms, health and longevity are the correct pursuits.

AI + hardware can be categorized into four quadrants: 1. General + Consumer: “Smartphone + AI Glasses.” If AI glasses validate PMF on a large scale, they will synergistically cover all-weather AI scenarios with smartphones; 2. Vertical + Consumer: AI toys, AI rings, and bracelets designed for specific demographics, enhancing user consumption value through Agent intelligence; 3. Vertical + Productivity: AI edge devices, embodied robotics, and blue-collar wearables, such as AI security cameras that monitor children, pets, and elderly individuals for anomalies, proactively alerting and reporting. For example, microwaves that automatically set temperatures after recognizing food. Future AI hardware tailored for content creators will help them enhance content creation efficiency and quality, such as AI creation glasses; 4. General + Productivity: here, general refers to relatively general, not broadly general. For instance, autonomous driving, where self-driving cars represent a form of AI robotics that replaces driving productivity.

At CES 2025, AI hardware will make its first mass appearance, although most will be mere patchwork and shells, with PMF still awaiting validation. I believe that in the future, a plethora of innovative AI hardware will emerge around these four quadrants, spinning the flywheel of human-product-data-AI experience, unleashing a new wave of experiential economy and consumption, immersing us in a new world.

Starting tomorrow, be a happy person, traveling the world no longer chopping wood and feeding horses, facing the sea, free and easy.

In the next five years, crossing the “AI infrastructure cycle” will usher in a “new continent cycle of AI.” The first village in history was electrified, cities were born, and humanity stepped out of geographical wastelands; when GPT-5 deciphers the insomnia at the end of the sheep skin scroll, the Aurelianos will put down their little goldfish tweezers—the century-long fate of productivity is being forged into a chain of empathy among the stars.

Leave a Comment