Eight Observations on Large Model Technology Development

Following the emergence of ChatGPT, the introduction of the Sora model has once again ignited enthusiasm for AI across various industries. In the face of rapidly evolving terminology, the plethora of personal and enterprise applications, and the continuous restructuring of business models, large models can exhibit astonishing capabilities and quickly impact society, which has deeper reasons behind it. At the recent “Yabuli China Entrepreneurs Forum Annual Conference,” Zhang Hongjiang, director of the Academic Advisory Committee of the Zhiyuan Research Institute, delivered a closing speech sharing his eight observations on the development of large model technology.

First, the Core of Breakthroughs in Large Models is Scaling Law

The Transformer architecture, which emerged in 2017, is the technical foundation for breakthroughs in large models. A series of technological innovations represented by this architecture has laid the development path of combining “computing power + data + algorithms” for artificial intelligence, leading AI from the 1.0 era into the 2.0 era. In the future, we will witness AI continuing along this path towards the grand goal of general artificial intelligence. The success of large models is not only a victory of algorithm innovation but also a significant advancement in systematic research.

In the history of large model development, Scaling Law has played a core role, serving as the main driving force for the continuous improvement of model performance.Scaling Law reveals a phenomenon: smaller language models can only solve part of the problems in natural language processing (NLP), but as the model scale increases — with the number of parameters growing to billions or even hundreds of billions — previously challenging problems in the NLP field can often be effectively solved. This law has not only been confirmed by reality but is also expected to continue leading technological breakthroughs in the future.

Eight Observations on Large Model Technology Development

Figure 1 Scaling Law — The Magic of Scale, Taking PaLM as an Example, When the Scale Increases to 540 Billion Parameters, Model Performance Soars

When the model scale is small, performance trends with parameter increases are relatively flat; as the model scale continues to expand and exceeds a certain inflection point, its performance sharply rises with the increase in parameters. This is the emergence of capabilities brought by the expansion of model scale, which is also a crucial reason why researchers have relentlessly pursued increasing model scale over the past few years. Additionally, the scale and quality of data are also critical — researchers should pay attention to the scale of data, and similarly scaled data of high quality will train better models.

Figure 2 a. As the Model Parameter Scale Increases, Accuracy Expectations Improve. b. Parameter Scale Growth Rate: Doubling Every 4.2 Months Since 2018

Second, Large Models are New Operating Systems and Computing Architectures

In traditional computer usage, humans input data, and computers perform calculations and output results, which constitutes the essence of human-computer interaction. Today’s large models achieve the above functions in a simpler and more convenient way — people no longer need to program or type; they can simply communicate with GPT, and it can execute commands according to their thoughts, indicating that large models have taken on the role of an “operating system.”

As an operating system, the model simplifies the process of calling applications for people. Just as WeChat mini-programs help reduce the process of downloading applications from the App Store, large models even eliminate the need to search for mini-programs. For example, people only need to tell the large model to draw a beautiful scene of Yabuli, and it can produce various angles of beautiful scenery of Yabuli, even generating videos. Therefore, as the application of large models becomes more widespread, the app form may disappear, and even mini-programs may be eliminated.

From another perspective, large models as a new operating system also bring a tremendous shift to the computing architecture, transitioning from a CPU-centric model to a GPU-parallel processing-centric model.This shift in architecture will bring new core technologies and new players. Recently, after Nvidia released its financial report, its stock rose 15%, confirming that the Wintel system of the past CPU era will be replaced by the current large model and GPU system.

Figure 3 Large Models are New Operating Systems, Transitioning the Computing Architecture from CPU-Centric to GPU-Centric

Third, Large Models as New Platforms Will Bring New Ecosystems

Large models are reshaping the computing ecosystem. The infrastructure layer of the traditional computing ecosystem is chips, while models are built on cloud platforms. Nowadays, many players in the AI field call large model APIs through cloud services, indicating the characteristics of large models as a platform and a service.

This is why every time OpenAI releases a new feature or launches a new agent, manufacturers of large model applications worry that their business will be “absorbed” by it, just as Microsoft Windows once posed threats and challenges to software vendors’ businesses. This concern confirms that models are platforms, models are applications, and models are products.

Figure 4 Large Models: New Platform, New Ecosystem

In the future, large models as a platform will undoubtedly empower all applications, forcing people to rewrite past software.This is why internet companies today are investing heavily in developing their large models, because in the future, internet companies without large models will no longer be seen as platform companies. This also explains why Amazon shows signs of falling behind, with its market value surpassed by emerging competitors.

Fourth, Large Model Performance is Key to Large-Scale Commercial Implementation

As the popularity of large models continues to rise, many attempt to create a vertical small model from an application perspective, but this approach is not feasible. If a model’s capabilities are insufficient to support large-scale application scenarios, its popularity may only be fleeting, followed by a sharp decline in user numbers. Just like 30 years ago when Apple launched a small Pad product called “Newton,” which was deemed revolutionary at the time, but people soon realized that its handwriting and voice recognition technologies could not support the new device, leading to its quick withdrawal.

Figure 5 Large Model Performance is Key to Large-Scale Commercial Implementation

Today’s large models show extensive application potential in intelligent customer service, text production, office Copilot, and other scenarios. However, if the large model has a high error rate during task execution or continues to struggle with hallucination issues, it will quickly face a trough brought about by a massive loss of users, and the application layer based on large models will also become unsatisfactory, making the derived business models difficult to sustain. Therefore, continuously improving large model performance is key to achieving large-scale commercial implementation.

Fifth, Large Models Will Simultaneously Drive Existing and New Applications

With the rapid development and continuous breakthroughs in large model capabilities, many tasks that previously required only marginal assistance from computers can now be competently handled by large models. Since the release of ChatGPT over a year ago, large models have significantly improved production efficiency in various application scenarios such as office work, video generation, and healthcare, with progress far exceeding that of the AI 1.0 era. In the future, we will witness large models driving a series of native applications to land quickly and create value.

When these native applications will emerge and whether they can seize investment opportunities is a source of anxiety for investors and practitioners. Looking back at history, whether during the PC era or the mobile internet era, extremely promising products have emerged during periods of rapid technological iteration and steady development. The situation in the large model era will likely be similar, so there is no need to be overly anxious. Market participants should rapidly digest the various impacts brought by large models while also focusing on their existing businesses to see how significantly these businesses will change when large model performance matures.

Figure 6 Large Models Will Simultaneously Drive Existing and New Applications

Large models will not only give rise to a series of native applications and AI startups but will also drive the appreciation and development of the existing market. Just as the transition from PC internet to mobile internet saw the market value growth of existing giants far exceeding the total market value of new companies.

Looking to the future, we have reason to believe that the efficiency improvements brought by large models will empower new application scenarios while promoting the growth of the existing market and revenue generation from native applications, leading to an overall increase in market value. Moreover, the core feature of native applications in the AI era is the natural interaction between humans and machines, making large models themselves the largest native application of the AI era. We can look forward to the emergence of this super application.

Sixth, Multimodal Large Models are the Ultimate Models for AGI

With the emergence of the Sora model, multimodal models have begun to ignite various industries. Unlike traditional language models that only treat corpus as tokens, multimodal models train using multimodal data such as videos, audio, and images as tokens, creating large models capable of understanding the physical world.

Figure 7 Sora Model Demonstration

Please ask Sora to generate an image of an SUV traveling on a mountain road based on the description, and the results are stunning. The car’s journey on the mountain road can fully comply with traffic rules, and it can turn effortlessly, simulating this image entirely through “self-learning” without a path map or 3D modeling.This requires the model to understand the physical world; it can be said that the emergence of Sora signifies a breakthrough in the “world model” capable of understanding, describing, and simulating the real world, giving us more confidence in achieving AGI.

Figure 8 World Models Refer to Models that Can Understand, Describe, and Simulate the Real World

Text-to-video generation is the most astonishing and highly anticipated feature of Sora. However, the capabilities of multimodal models extend beyond text-to-image and text-to-video; they also include understanding and interpreting images or videos — multimodal models should possess both forward generation and backward understanding capabilities. As American physicist Richard Feynman once said, “What I cannot create, I do not understand.” Only when large models truly understand the physical world can they better create and simulate.

How to train such models? To illustrate, a pilot learning to fly an aircraft must first train for a long time in a simulated cockpit, which is formed by spatial models and aircraft dynamics simulations. The pilot learns all operations from simulated data before applying them to real flying. Similarly, we can generate new training data by observing and describing the physical world, thereby training large models that can understand and simulate the physical world.

Following this logic, regarding future applications, multimodal large models are likely to rewrite autonomous driving systems. In the past, every time autonomous driving companies entered a new city, they had to spend 3 to 6 months re-scanning all the streets of that city; in the future, this work can be done by multimodal large models.

Seventh, Multimodal Large Models Drive General Machines — From Simple Instructions to Self-Planning

In addition to language models and multimodal large models, embodied models centered around robotics are also a key research focus today. By combining multimodal models and robots, we can enhance the robots’ ability to understand and describe the surrounding world, enabling them to take autonomous actions — this is the future of general robotics.

For example, in training a robotic hand to open a microwave, traditional robotic hands might struggle with operating the microwave door lock. However, after introducing a multimodal large model, the robotic hand can learn the operation manual through interaction with the model, mastering the steps needed to press the unlock button before opening the microwave. From this perspective, future robots will not only be able to perform tasks they have been trained on but can also, with the support of large models, accomplish tasks they have not been trained for but can understand through self-learning. Driven by large models, robots will achieve a leap from simple instructions to self-planning capabilities, a process that gives researchers hope for realizing general robotics.

Figure 9 Multimodal Large Models Drive General Machines: From Simple Instructions to Self-Planning

Eighth, the “Singularity” is Coming; the Future Will Be a World of Autonomous Intelligence

As language models, visual models, and robotic embodied models mature, the expected timeframe for achieving general robotics has been reduced from over ten years to less than five years. We will soon witness the birth of the next generation of autonomous action systems.

Thirty years ago, when “Deep Blue” defeated Russian chess champion Garry Kasparov, he realized that computers could help humans improve their chess skills — this was the Copilot stage of artificial intelligence. When technology evolved to AlphaZero, computers no longer relied on human chess records but played according to their algorithms, rules, and objective functions, using moves that human masters couldn’t comprehend to achieve victory.
In the face of AI, the accumulation of human wisdom over two thousand years seems trivial. It is difficult for humans to win against AlphaZero in chess, indicating that the Copilot will be replaced as AI technology evolves.

Figure 10 Historically, the Copilot Will Be Rapidly Replaced

AlphaZero’s emergence also indicates that in certain application scenarios, machines already possess a god’s-eye view — while humans see only a small hill or valley, the “god” sees the entire terrain.

In the book “The Singularity is Near,” the term “Singularity” refers to the point when the speed of technological development exceeds the average learning ability of humans, enabling machines to perform new tasks better than humans. Any tasks requiring average IQ or skills will be replaced by machines. Today, the powerful learning and reasoning capabilities of large models signify that the “Singularity” is imminent.

Figure 11 Is the Singularity Approaching?

Looking ahead, enterprises may no longer rely on hiring more or stronger employees to improve productivity and achieve business growth, but rather on purchasing more computing power and utilizing more powerful autonomous intelligent agents. This may be the future we are about to embrace, a future we must imagine.As the author of “Sapiens” stated, the future world may only have 3% of superhumans, while the remaining 97% will be idle people.Today, large models have already shown us the possibilities of this future.

Source:Zhiyuan Community