Author: Liu Yiqin
Source: Caijing Eleven (ID:caijingEleven)
The hottest topic in the technology field in 2023 is AI large models. This wave was led by the American startup OpenAI. After the release of ChatGPT, Chinese companies have intensively released their own large models, with the number of large models released by Chinese companies exceeding 130 throughout 2023.
OpenAI’s ability to achieve technological breakthroughs is similar to many companies in the field of technological innovation. It has excellent talent, massive funding support, years of continuous investment, and a firm commitment to its goals. For a long time before the release of ChatGPT, the industry and the investment community were mostly skeptical of OpenAI, but this did not shake the company’s direction.
In 2023, almost everyone recognized the direction of large models, believing that OpenAI has already presented the results, and other companies just need to follow up quickly, continuously optimize, and ensure participation in the future.
Some attribute the lack of large-scale investment in large models in the past to uncertain outcomes. Now it is clear that computing power, data, and talent can be invested more heavily. Chinese companies excel in engineering optimization, and it is only a matter of time before they produce practical large model products.
But is this really the case?
For OpenAI, large models have always been a certain direction. Most of OpenAI’s funding has been spent on computing power, and at that time, the price of NVIDIA’s A100 (AI-specific chip) was much lower than today. According to estimates from the third-party data agency SemiAnalysis, OpenAI used about 3617 HGX A100 servers, containing nearly 30,000 NVIDIA GPUs.
Having GPUs alone is not enough. Investor Microsoft helped OpenAI build a customized computing power cluster for large models, which can further enhance the efficiency of these GPUs. In terms of data, OpenAI has made continuous investments in every aspect of data collection, annotation, cleaning, organization, and optimization. Most of the team members at OpenAI come from top research institutions or technology giants.
In other words, with such strength and investment intensity, OpenAI still took more than eight years to create the breakthrough product GPT-4, which still has “hallucinations” (i.e., answering incorrectly, making nonsensical statements, etc.).
Why can Chinese companies produce large models that claim to rival GPT-4 in just a few months? Whose hallucination is this?

In the second half of 2023, some large models were pointed out to be “shell models,” directly using foreign open-source large models. They ranked high on some lists that test large model capabilities, with many indicators close to GPT-4. Several industry insiders told us that the better the performance on the list, the higher the proportion of shell models, and slight adjustments would lead to worse performance.
“Shell models” are just the tip of the iceberg of the current state of China’s large model industry, reflecting five issues in industrial development that are interrelated and cannot be solved independently.As of today, the public enthusiasm for large models has clearly declined, and in 2024, the problems of China’s large model industry will be further exposed. However, beneath the excitement and problems, large models have already begun to create value in the industry.
In November 2023, Alibaba’s former technical vice president and AI scientist Jia Yangqing stated that a large model developed by a domestic company used Meta’s open-source model LLaMA, only modifying a few variable names. Jia Yangqing indicated that renaming caused them to do a lot of work to adapt.
Previously, foreign developers claimed that the “Zero One Everything” founded by Kai-Fu Lee used LLaMA, only renaming two tensors, leading to industry skepticism that Zero One Everything is a “shell model.” Subsequently, both Kai-Fu Lee and Zero One Everything responded, stating that they used the open-source architecture during the training process, with the aim of thoroughly testing the model and conducting comparative experiments, allowing for a quick start. However, the Yi-34B and Yi-6B models they released were trained from scratch and underwent significant original optimization and breakthroughs.
In December 2023, media reports indicated that ByteDance’s secret large model project called OpenAI’s API (Application Programming Interface) and used data generated by ChatGPT for model training. This was explicitly prohibited by OpenAI’s usage agreement.Subsequently, OpenAI suspended ByteDance’s account, stating that it would investigate further and would require changes or termination of the account if the allegations were true.
ByteDance responded that in early 2023, during the initial exploration of large models, some engineers applied GPT’s API service to experimental projects involving smaller models. This model was only for testing, never intended for launch, and was never used externally. After the company introduced API usage norms for inspection in April 2023, this practice has ceased. The ByteDance large model team has also proposed clear internal requirements that no data generated by the GPT model should be added to the training dataset of ByteDance’s large model, and trained the engineering team to comply with service terms when using GPT.
Currently, domestic large models are mainly divided into three categories: original large models; shell models based on foreign open-source large models; and assembled models, which combine past smaller models to create a large model that appears to have a large parameter count.
Among them, the number of original large models is the smallest, as creating original large models requires strong technical accumulation and sustained high investment, which carries high risks. If a model does not have sufficient competitiveness, the large-scale investment will be wasted.
The value of large models needs to be proven through commercialization. When there are already sufficiently good foundational large models on the market, other companies should explore new value points, such as applications of large models in different fields or intermediate layers, such as assisting in model training, data processing, and computing power services.
However, the current situation is that most participants are “competing” in the so-called “original large model,” while being overly concerned about high risks, leading to a large number of shell and assembled large models.Whether directly using open-source models or assembled models, as long as they comply with relevant regulations, there is no problem.
At the commercialization stage, customers are less likely to care about whether the model is original or not; if it works, that’s enough. In fact, many customers are more willing to choose non-original technology due to lower costs. The problem is that even with assembled and shell models, everyone continues to emphasize “originality.” To prove “originality,” adjustments and modifications are needed, which can affect the iterative capability of large models, leading to internal friction.
One of the foundations of large models is massive computing power, and it is advanced computing power, hence large models are also referred to as brutal aesthetics. NVIDIA’s A100 was previously considered the most suitable for training large models, and recently NVIDIA launched a more advanced computing chip, the H100, which has not yet been released in the Chinese market.
A long-term partner of NVIDIA told us that the price of the A100 in 2023 has roughly doubled. According to him, the Chinese companies that intensively purchased A100s in 2023 were mainly large companies with business needs, including Alibaba, Tencent, ByteDance, Baidu, etc., while startups purchased very few. Some well-known large model startups actively sought to establish strategic partnerships with him to publicly demonstrate their investment in computing power, “without paying.”
Despite the U.S. government’s “export control regulations,” it is not impossible for Chinese companies to obtain NVIDIA’s computing power; there are many options available. In addition to direct purchases, they can also buy through NVIDIA’s partners in China. GPUs are inherently expensive, and the costs of deployment, operation, debugging, and usage add up. A saying circulating in the industry is that many research institutions in China cannot even afford the electricity bill for A100s.
A DGX server composed of eight A100s has a maximum power of 6.5 kW, which means it requires 6.5 kWh to run for an hour, along with approximately the same amount of power for cooling equipment. Based on an average industrial electricity price of 0.63 yuan per kWh, the electricity cost for running one server for a day (24 hours) is about 200 yuan.If it is 1000 servers, the daily electricity cost is about 200,000 yuan.
Therefore, apart from large companies, startups find it difficult to purchase and deploy GPUs on a large scale.GPU resources can also be rented; on cloud service platforms like Alibaba Cloud, Tencent Cloud, or Amazon AWS, A100 computing power services can be directly rented.The rental prices have also increased significantly over the past year. However, the reality is that many large model companies are reluctant to make large-scale investments in computing power.
Several investors focused on AI told us that once a startup begins to deploy computing power, two “problems” arise: first, this investment has no upper limit or endpoint; no one knows how much to burn, and OpenAI still experiences downtime due to insufficient computing power; second, the company becomes a heavy asset company, which adversely affects its future valuation and directly impacts investor returns.
In 2023, many investors directly advised large model entrepreneurs to hire some people with prestigious backgrounds, quickly hold press conferences to launch large model products, and then seek the next round of financing without investing in computing power.
During the boom period, startups secured massive funding, hired at high salaries, and launched products publicly, driving up valuations. Once the boom passes, continued financing or going public will require revenue. At that point, they may use previously raised funds to bid on projects at low prices or even at a loss, or directly invest externally to consolidate revenue.
This could lead to a vicious cycle: reluctance to bear the high investment risk in computing power makes it difficult to achieve breakthrough developments in large models, thus making it hard to compete with the giants that are genuinely investing heavily in this direction.
Data and computing power are both foundational elements of large models, and in terms of data, the Chinese large model industry faces similar issues as computing power: Is it worth investing heavily?

In China, the threshold for data acquisition is generally low. In the past, data was mainly collected using web scraping tools, and now open-source datasets can be used directly. Chinese large models primarily rely on Chinese data, and it is widely believed that the quality of Chinese internet data is low.
One AI company founder described that when he needs to search for professional information on the internet, he uses Google or YouTube. Domestic websites or apps do not lack professional information, but the abundance of advertising content makes it take longer to find professional content.
OpenAI also used Chinese internet platforms for training its large models, but it did a lot of additional work to enhance data quality, which is not something ordinary data annotation work can achieve; it requires a professional team to clean and organize the data.
Previously, AI entrepreneurs have stated that it is difficult to find relatively standardized data service providers in China, as most offer customized services, and customized services are expensive.
This is somewhat similar to the logic of whether to invest heavily in computing power; for many companies, especially startups, this investment does not seem worthwhile. If a large-scale investment is made, and the final model performance is not satisfactory, it is also a “waste of effort”; it is better to train with open-source data and hold a press conference directly.
Moreover, the Chinese market lacks effective data protection measures.A person in charge of AI at a large company said, “In China, the data you can obtain, others can obtain too… If you spend a lot of money to create high-quality data, others can acquire it at a much lower cost, and vice versa.”
The intermediate links in large models, including data processing, will become a relatively clear new development direction in 2024. Regardless of the type of model, when applied to specific scenarios, professional data optimization and debugging are required, which raises the requirements for data processing. Additionally, model debugging, engineering optimization, and other links must be involved.However, if these links become a “new boom” in the eyes of investors, that will be another story.
The above three issues point to a common direction: short-sighted capital.Although OpenAI has already paved a clear path, for the vast majority of companies, starting from scratch to develop a mature large model will not take significantly less cost and time.
For most investors, the purpose of each investment is clear: exit and make money.OpenAI has become popular, with its valuation soaring and expected to continue to grow. In April 2023, its valuation was approximately $28 billion, and by December 2023, U.S. media reported that OpenAI’s latest round of valuation might exceed $100 billion. This is a very certain signal in the eyes of investors; if they can invest in Chinese large model startups at the right price, they can achieve rapid valuation growth in a short period.
Chinese investors’ patience only lasts three to five years, dictated by the capital operation model.Investors need to exit and achieve substantial returns within a certain timeframe after raising funds from LPs. Exit channels for investors include project mergers and acquisitions, IPOs, or selling their shares to new investors in subsequent financing rounds.
Early financing can rely on trends and storytelling, but in the later stages or even IPOs, a certain scale of commercialization capability is required. Investors find that the longer it drags on, the harder it becomes for a project to go public or be acquired, as the primary business model in the AI field is to do customized projects for B-end clients, which makes it difficult for startups to generate high-growth revenue. Investors can only push companies to complete multiple rounds of financing quickly while the trend is still hot, raising valuations, and even if they have to sell their shares at a discount later, it is still profitable.
This is why, in 2023, numerous large model press conferences emerged, and various large model rankings bloomed with different rankings; these are all “stories” that help with financing. A similar path was seen in the AI industry a few years ago, represented by the “Four Little Dragons” of AI. The large model startups of 2023 are merely accelerating the completion of the paths taken in the past three years within a year.
But short-sightedness is not solely an issue of investors.In today’s business environment, most people pursue short-term, certain results; the future, even ten or five years down the line, seems difficult to grasp.
In 2023, China’s large model industry rapidly transitioned from competing over large model parameters to competing over commercialization. At the CES (Consumer Electronics Show) in January 2024, two renowned AI scientists, Fei-Fei Li and Andrew Ng, both stated that AI commercialization would see significant development, penetrating more industries.
Currently, it seems that there are two main application directions for large models.
One is to provide new tools for C-end users through large model technology, such as the paid version of GPT-4, Baidu’s Wenxin large model that reconstructs Baidu Wenku, new AI video editing tools, text-to-image tools, etc.However, short-term large-scale growth in C-end payments is difficult, as there is a relatively small group of people with a strong demand for large model tools.
The second, more promising direction for commercialization is B-end services.In the Chinese market, providing B-end software services has always been a “difficult business.” Many investors and industry insiders have pointed out that the largest B-end customers in the Chinese market are government and state-owned enterprises. Large models, as advanced productivity tools, will directly lead to a reduction in manpower. However, in government and state-owned enterprises, reducing manpower often becomes a resistance.
If we settle for smaller B-end customers, it may also be difficult in 2024. One AI large model entrepreneur mentioned that he recently inquired with numerous enterprise clients and received responses like, “What can large models do? Can they help me cut jobs or make money?”
To this day, even the most advanced large models still have “hallucination” issues. This can be tolerated in C-end applications, but in some professional B-end scenarios, hallucinations mean difficulty in truly landing applications.
In the past, comparative AI, such as facial recognition, had low costs for human-assisted adjustments when errors occurred. However, large models are prone to “seriously nonsensical” outputs, which can be quite misleading.
Nonetheless, large models have already been concretely applied. Many industry insiders have mentioned that due to the emergence of large models, many previously unsolvable problems now have new methods for resolution, and efficiency has significantly improved. For instance, the previously mentioned assembled large models were rarely attempted before, but now many AI companies are beginning to combine multiple smaller models from different scenarios, allowing for direct invocation without the need to retrain models when solving most similar problems.
Additionally, in some companies with massive operations, large models are already in practical use. Similar to how the previous round of AI visual technology drove the development of AI algorithms, these AI algorithms are quickly playing important roles in content recommendation, e-commerce, ride-hailing, food delivery, and other fields. Currently, Tencent’s gaming business, Alibaba’s e-commerce business, and ByteDance’s content business have all adopted large models.

In 2024, the development of AI large models will likely exhibit several relatively certain trends.
First, the enthusiasm for financing will decline. The situation in which a company completes multiple rounds of financing worth hundreds of millions of dollars will significantly reduce. Large model startups will need to find new avenues. Currently, it appears that large companies are more capable of handling the foundational infrastructure for large models, while startups can consider adjusting their direction to fill the gap between foundational large models and applications.
Second, the application of large models will continue to deepen, but this will mainly focus on fields with high digitalization and large business volumes. In the C-end market, large models will also become more prevalent, but for Chinese companies, relying solely on C-end user payments is insufficient; other monetization models, primarily advertising, will be incorporated into C-end application scenarios.
Third, domestic computing power will receive further attention. However, this does not mean there will be significant progress in the short term; it is a long process. As domestic computing power capabilities improve, there will also be more phenomena of opportunistic hype, creating a bubble.
The trend will stimulate rapid industry expansion, and bubbles will arise; the greater the opportunity, the larger the bubble. Only by stripping away the bubbles can we see the new opportunities for industrial development.
*Image source: Unsplash
*The purpose of this reprint is to convey more information and share, and does not mean to confirm its authenticity or constitute other suggestions. It only provides a communication platform and is not responsible for its copyright. If there is any infringement, please contact us for timely modification or deletion.
