The competition among large models is intensifying, but many practitioners have increasing doubts about large models.
In the field of foundational large models, several internet giants like Tencent, Alibaba, and Baidu have entered the fray, and companies like Alibaba, Baidu, and iFlytek have successively released the latest versions of their large model products in the past month, significantly enhancing their technical capabilities; startups are iterating new versions while also “crazy” financing, with Zhizhu AI recently announcing it has raised 2.5 billion yuan in funding this year, and Baichuan Intelligence, established less than half a year ago, has secured 350 million dollars in funding, with many of its investors being internet giants like Tencent, Alibaba, and Meituan.
Does China really need so many foundational large models? Each foundational large model company is frantically “competing” on technical parameters, but what kind of large models does the market actually need? Amidst the lively yet chaotic battleground, these questions are being raised by more and more people.
To answer this question, one must first understand how foundational large models make money in the Chinese market. Although the vast majority of people perceive large models as chatbots, many users have started using these products for information searches and document organization, but related companies find it difficult to make money from these C-end products; in fact, the larger the user base, the greater the losses for the companies. Currently, the most pragmatic direction for the commercialization of foundational large models is still in the B-end, serving enterprises in retail, finance, manufacturing, and other fields to reduce costs and improve efficiency, which can yield stable commercial returns.
Enterprises’ demand for foundational large models can be categorized into three types: directly calling large model APIs to obtain relevant large model capabilities; developing secondary applications based on large models that align with actual business needs; and developing AI applications based on large models. These demands test the technical capabilities of foundational large model platforms, and even more so their ability to serve enterprises.
From the perspective of service capability, both large model startups and internet giants must start from scratch, as no one has an inherent advantage. Platforms that can quickly discern customer needs and provide stable and reliable services will stand out.
Large models are not just “blown” into existence.
Using the rise of ChatGPT as a dividing line, the development of domestic large models has experienced a stark contrast.
Before ChatGPT became popular, there were only a handful of companies engaged in foundational large model research and development in China, as the technology and service capabilities of large models had not yet been widely accepted by the market. These companies mainly focused on technological R&D and the accumulation of service capabilities. After the popularity of ChatGPT, a large number of investors and practitioners flooded in, making large models the new trend.
The booming market easily gives rise to speculators, who do not delve deeply into technology but rather tell stories and hype concepts, regardless of their technical and service capabilities, first spinning grand tales to gain recognition from the capital market and customers. One practitioner humorously remarked to Huxiu that many domestic large model companies claim their gap with GPT-4 is only a few months because they have trained on the already open-sourced GPT-2 and then come out to tell stories.
In reality, the evolution of large model technical capabilities cannot be achieved by casually training for a few months, as it is a complex system where scale is crucial; without reaching a certain scale, it cannot produce more intelligent evolution. Increasing the training scale of large models requires a lot of time for repeated tuning. Technicians who have adjusted large model training parameters understand this difficulty: no one tells you what to do; you can only explore on your own, and various unexpected situations will arise that all require time to resolve.
In China, the foundational large models that truly have confidence in their technical capabilities are those that began training before the rise of ChatGPT. At that time, large models were not well-known, and many people did not understand or have faith in them, but companies that persisted in investing in large models were very certain about the new technology.
For example, the Zhiyuan Research Institute launched the first ultra-large scale pre-training model research project “Wudao” in 2020, whose 2.0 version once became the world’s largest trillion-scale model. This year, after its upgrade, “Wudao” covers foundational large models in language, vision, and multi-modality, and has entered a phase of complete open-sourcing.
Zhizhu AI also began developing the GLM pre-training architecture in 2020, and trained a model with 10 billion parameters, GLM-10B. On October 27, Zhizhu AI released its self-developed third-generation dialogue large model ChatGLM3, which has significantly improved performance, reasoning ability, and context capacity compared to the previous generation. Compared to ChatGLM2, ChatGLM3 ranked first among domestic models of the same size in tests across 44 Chinese and English public datasets. Among them, MMLU improved by 36%, CEval by 33%, GSM8K by 179%, and BBH by 126%.
Additionally, in terms of functionality, several domestically developed large models released by Zhizhu AI (ChatGLM, CodeGeeX, WebGLM, CogVLM, etc.) are currently the most comprehensive counterparts to OpenAI’s series of large models in China, and are applicable in the generative AI assistant “Zhizhu Qingyan”.
The earliest companies working on large models are fundamentally different from those speculating on concepts and chasing trends. They immersed themselves in the field before large model technology exploded and the market became so competitive because they understood the value of foundational large model technology and its business logic. This difference is also very evident after the rise of large models; many companies have ventured into C-end products for traffic and topicality, while early players like Zhizhu AI have focused more on enterprise service, with all their R&D and service capabilities laid out around this approach, steadily accumulating and moving towards creating value for customers.
The complexity of large models determines that companies with longer accumulation times in technical and service capabilities possess stronger advantages. As more people in the market become aware of the complexity of large models and the time required for their evolution, those large model companies that rely on storytelling and concept hype will find their space for survival increasingly limited, while those that earnestly accumulate technical and service capabilities will withstand the first wave of competition.
Without a prosperous ecosystem for large models, there is no future.
In the commercialization of large models, those who can land applications in social necessities will be the first to form a virtuous cycle of self-sustaining.
General large models have a broader application range but are not professional enough in solving specific problems in vertical fields. Vertical large models are stronger in solving specific domain problems, but their service range is very limited, which makes it difficult for many vertical large models to achieve a balance between cost and commercial returns, limiting their development space.
The ultimate goal of large model applications is to serve daily life and production, solving practical problems in work and life, and improving work efficiency and productivity. Considering the current advantages and disadvantages of general and vertical large models, a more suitable approach in the commercialization of large models is to use general large models as a base, open-source technology and service capabilities to retail, finance, manufacturing, and other fields, and collaborate with relevant enterprises to co-build application scenarios.
Due to constraints such as data, computing power, and scenarios, there will not be many large models that can truly run open-source. Meanwhile, as a foundational technology base, the role of large models is very similar to that of PC and mobile operating systems, which will present a competitive landscape of “only a few trees can grow tall in the shade,” meaning one or two technical bases will dominate the industry, and all application developers must develop based on these one or two technical bases. If the foundational large model cannot form a prosperous ecosystem, it will lack sustainable development capabilities.
From the historical development of PC and mobile operating systems, first-mover advantage is very important. Once Windows dominates the PC market and iOS and Android split the mobile field, it is very difficult for other operating systems to turn the tide.
The same trend is evident in the field of large models. Large models will open a prosperous AI application ecosystem, where personal and enterprise data, capabilities, or applications can quickly turn into AI plugins, enhancing the capabilities of large models, and making them more practical and user-friendly.
Currently, giants like Baidu and iFlytek are already committed to ecosystem construction, with Baidu’s Smart Cloud Qianfan large model platform 2.0 having nearly ten thousand active enterprise users, covering over 400 scenarios across multiple industries such as finance, education, manufacturing, energy, government, and transportation, and iFlytek’s Spark large model platform having over 700,000 developers.
Some startups that have accumulated experience in the large model field have also been among the first to reap the benefits. Zhizhu AI currently has over 1,000 clients and more than 100 partners co-building ecosystems, covering multiple scenarios such as media, SaaS, education, and office work. For instance, the capabilities behind WPS’s intelligent document generation for presentation content and news writing are supported by Zhizhu AI’s technological capabilities.
In the competitive ecosystem of various large model platforms, the most challenging aspect for a platform is the value it brings to its partners and its ability to grow alongside them. In office scenarios, for example, the generation of presentation content, writing articles, and style rewriting require very high precision and reasoning capabilities from large model platforms, and only those large models that reach a certain technical level can support these applications, while the platforms also need to iteratively correct based on user feedback from practical applications.
Whether for giant companies or startups, no matter how strong their financial and resource capabilities, they must start from scratch, accumulating and iterating step by step. Therefore, the time advantage is very important in the process of building ecosystems for large model platforms. This is why startups with first-mover advantages can compete on equal footing with internet giants with stronger financial resources.
In the battle of hundreds of models, which is more suitable for the Chinese market?
Although the battleground of hundreds of models is lively and chaotic, the competitive direction behind it is already very clear. The technical and service capabilities, as well as the ability to build ecosystems for large model platforms, directly determine the direction of competition.
Building these capabilities requires time accumulation and cannot be achieved overnight, but simply having time accumulation is not enough. First-mover advantage, besides the time difference brought by early action, also includes the ability to accurately discern market needs, which means acting swiftly and decisively along the right strategic path; strategic wavering and taking detours can easily deplete the time advantage gained from early action.
As more and more large model platforms shift their focus to ecosystem construction, the strategic determination and execution ability of the platforms will become increasingly important in the competition of ecosystems. Whoever can efficiently complete the accumulation from 0 to 1 across various fields and scenarios will have a more pronounced advantage. Once a few platforms complete the qualitative transformation into super platforms, the competitive landscape will be basically determined.
In the large and complex domestic market, B-end service enterprises are prone to strategic wavering and taking detours. On the one hand, the regional differences and operational scales of domestic market enterprises lead to significant differences in their understanding of the intelligent value that large models bring, and the resources they are willing to invest and the costs they are willing to bear also vary, making it difficult to find a standardized solution; on the other hand, different enterprises in different fields have different demands for large model capabilities, and even different enterprises within the same field have varying needs for large models. Under the pull of these different demands, large model companies can easily transform from foundational technology bases into project outsourcing companies, making it difficult to become true super platforms.
In such an environment, compared to OpenAI’s commercialization plan, domestic large model platforms need to pay more attention to details in their commercialization landing. We see this trend in the commercialization ideas of some platforms.
For example, in addition to the common open platform API services, Zhizhu AI also provides two solutions: cloud privatization and local privatization. Cloud privatization can assist enterprises in building exclusive large models based on private data, offering stronger security, while local privatization is a unique solution for the Chinese market, providing not only more powerful models but also a complete model matrix to meet various scenarios and needs.
For different customer needs such as text generation, intelligent customer service, and data annotation, as well as the demand scale of small, medium, and large enterprises, Zhizhu AI provides different solutions, allowing clients to freely combine based on their needs. This more detailed and flexible service model is also derived from long-term precise insights into the Chinese market.
In the face of external uncertainties, Zhizhu AI has also launched a domestic chip adaptation plan, collaborating with domestic hardware and chip manufacturers to provide different levels of certification and testing for various types of users and chips, making large model services more secure and reliable. Currently, the ChatGLM series supports over ten types of domestic hardware ecosystems, including Ascend, Shenwei Supercomputing, Haiguang DCU, Haifeike, Muxixi Cloud, SuanNeng Technology, Tianshu Zhixin, Cambrian, MoEr Thread, Baidu Kunlun Chip, Lingxi Technology, and Great Wall Super Cloud, with the simultaneously released mobile deployable end-test models ChatGLM3-1.5B and 3B supporting various mobile devices such as Xiaomi, Vivo, and Samsung, as well as in-vehicle platforms.
As the battle of hundreds of models intensifies, these seemingly inconspicuous details become increasingly important, as they determine the level of recognition from external partners and relate to the speed of large model implementation in different scenarios. Simply releasing a large model is not as high a barrier to entry as the market imagines, but having high-quality data scenarios is essential for continuous iteration and forming competitive barriers, and the key to high-quality data scenarios lies in external partners—making it easier for platforms that are more willing to be chosen by partners to run this commercial cycle smoothly.
In this competition, many practitioners believe that the winner will definitely be the tech giants with stronger resources and funding capabilities, but this is not necessarily true. Both startups and giant companies need to be hands-on and delve into details; there are no shortcuts. As for funding, it is not the fundamental determinant of the battle, as startups with core competitiveness will not lack funding—Zhizhu AI has already secured the highest funding among large model startups and has more new investors wanting to enter the game.
Looking at it from another angle, the capital market is already voting with its feet on which foundational large model is more suitable for Chinese enterprises.
Special Planning