Five Real Issues in China's Large Model Industry

The wave of large model startups is surging, and only by looking beyond the hustle and bustle can we see the new opportunities brought by large models.

Image/Visual China

Written by | Liu Yiqin

Edited by | Xie LirongIn 2023, the hottest topic in the technology field is AI large models. This wave was led by the American startup OpenAI, and within months of the release of ChatGPT, Chinese companies have been intensively launching their own large models. Throughout 2023, the number of large models released by Chinese companies has exceeded 130.OpenAI’s ability to achieve technological breakthroughs shares characteristics with many companies in the field of technological innovation. They have enough talented personnel, massive financial support, years of sustained investment, and a firm commitment to their goals. For a long time before the release of ChatGPT, the industry and investment community largely did not have confidence in OpenAI, but this did not shake the company’s direction. In 2023, almost everyone recognized the direction of large models, believing that OpenAI has already presented the results; all other companies need to do is to follow quickly, continuously optimize, and ensure they can participate in the future.Some people attribute the lack of large-scale investment in large models in the past to uncertain outcomes. Now it is certain that computing power, data, and talent can be invested more heavily. Chinese companies excel in engineering optimization, and the production of practically applicable large model products is just around the corner.But is this really the case? For OpenAI, large models have always been a certain direction. Most of OpenAI’s funding has been spent on computing power, and at that time, the price of NVIDIA’s A100 (AI-specific chip) was much lower than today. According to estimates from third-party data agency SemiAnalysis, OpenAI used about 3,617 HGX A100 servers, containing nearly 30,000 NVIDIA GPUs. Having GPUs alone is not enough; investors like Microsoft helped OpenAI build a customized computing cluster for large models, which can further improve the efficiency of these GPUs. In terms of data, OpenAI has made continuous investments in every stage of data collection, labeling, cleaning, organizing, and optimizing. Most of the team at OpenAI comes from top research institutions or tech giants.In other words, under such strength and investment intensity, it still took OpenAI more than eight years to create the groundbreaking product GPT-4, which still has “hallucinations” (that is, providing irrelevant answers, nonsense, etc.).Why can Chinese companies produce large models that claim to rival GPT-4 in just a few months? Whose hallucination is this?In the second half of 2023, some large models have been pointed out to be “shells” that directly use foreign open-source large models. They rank high on some lists that assess large model capabilities, with many metrics close to GPT-4. Several industry insiders told “Caijing” reporters that the better the performance on the lists, the higher the proportion of shells; slight adjustments can lead to a decline in performance.

“Shells” are just the tip of the iceberg of the current state of China’s large model industry, reflecting five problems in industrial development, which are interrelated, and each problem cannot be solved independently.Today, the public enthusiasm for large models has significantly declined, and in 2024, the problems of China’s large model industry will be further exposed. However, beneath the hustle and bustle, large models have already played a role in the industry.

Models: Original, Assembled, or Shells?In November 2023, former Vice President of Technology at Alibaba and AI scientist Jia Yangqing stated that a large model made by a domestic big company used Meta’s open-source model LLaMA, only modifying a few variable names. Jia Yangqing indicated that the renaming required a lot of work to adapt.Previously, foreign developers claimed that the “Zero One Everything” founded by Kai-Fu Lee used LLaMA, only renaming two tensors, leading to industry doubts that Zero One Everything is just a “shell”. Subsequently, both Kai-Fu Lee and Zero One Everything responded, stating that they used the open-source architecture during the training process with the goal of thoroughly testing the model and conducting comparative experiments, which allows for a quick start. However, the Yi-34B and Yi-6B models they released were both trained from scratch with a lot of original optimizations and breakthroughs.In December 2023, media reports indicated that ByteDance’s secretly developed large model project called OpenAI’s API (Application Programming Interface) and used data output from ChatGPT for model training. This was explicitly prohibited by OpenAI’s usage agreement.Subsequently, OpenAI suspended ByteDance’s account, stating it would further investigate, and if confirmed, would require changes or termination of the account. ByteDance responded that in early 2023, during the initial exploration of large models, some engineers used GPT’s API service in a small model experimental project. This model was only for testing, had no plans to go live, and was never used externally. After the company introduced GPT API calling norms for inspection in April 2023, this practice has stopped. Moreover, the ByteDance large model team has proposed clear internal requirements that data generated by the GPT model must not be added to the training dataset of ByteDance’s large model, and trained its engineering team to comply with service terms when using GPT.Currently, domestic large models can be divided into three categories:First, original large models; second, shells of foreign open-source large models; third, assembled large models, which combine previous smaller models to create a large model that appears to have a large parameter size.Among them, original large models are the fewest. Creating original large models requires a strong technical foundation and continuous high investment, which is very risky because if the model does not have sufficient competitiveness, these large-scale investments will go to waste. The value of large models needs to be proven through commercialization. When sufficiently good foundational large models appear in the market, other companies should explore new value points, such as applications of large models in different fields, or intermediate layers, such as helping large models with training, data processing, computing power services, etc.However, the reality is that most participants are “competing” for so-called “original large models,” while fearing the risks are too high, resulting in a large number of shells and assembled large models.Whether directly using open-source models or assembled models, as long as they comply with relevant regulations, there is no problem. At the stage of commercialization, clients are not too concerned about whether the technology is original; as long as it is useful, many clients may even prefer non-original technology due to lower costs.

The problem is that even with assembled and shelled models, everyone must continually emphasize “originality.” To prove “originality,” adjustments and modifications are required, which will affect the iteration capability of large models and lead to internal friction.

Computing Power: Bottleneck or Reluctance to Buy?One of the foundations of large models is massive computing power, and it is advanced computing power, which is why large models are also referred to as violent aesthetics. NVIDIA’s A100 was previously considered the most suitable for training large models, and recently NVIDIA launched a more advanced computing chip, the H100, but it has not yet been sold in the Chinese market.A long-time partner of NVIDIA told “Caijing” reporters that in 2023, the price of A100 has increased by about 100%. According to his understanding, the Chinese companies that intensively purchased A100 in 2023 were mainly large companies with business needs, including Alibaba, Tencent, ByteDance, Baidu, etc., and very few startups. Some well-known large model startups actively requested to establish strategic partnerships with him to prove their investment in computing power, “the kind that doesn’t require payment.”

In 2023, the Chinese companies that intensively purchased A100 were mainly large companies with business needs, and very few startups. Image/IC

Despite the U.S. government’s “export control rules,” it is not impossible for Chinese companies to obtain NVIDIA’s computing power; there are currently many options available. In addition to direct purchases, they can also buy through NVIDIA’s partners in China. GPUs are expensive, and after purchasing, the deployment, operation, debugging, and usage are all costs. A saying that circulated in the industry is that many research institutions in China cannot even afford the electricity for the A100.A DGX server composed of eight A100s has a maximum power of 6.5kW, meaning it requires 6.5 kWh of electricity to run for one hour, and must be paired with approximately the same amount of cooling equipment. Based on an average industrial electricity price of 0.63 yuan per kWh, the electricity cost for one server running for a day (24 hours) is about 200 yuan.If it is 1,000 servers, the daily electricity cost would be approximately 200,000 yuan.Therefore, apart from large companies, it is very difficult for startups to purchase and deploy GPUs on a large scale.GPU resources can also be rented; on cloud service platforms such as Alibaba Cloud, Tencent Cloud, or Amazon AWS, A100 computing power services can be directly rented. The rental prices have also increased significantly over the past year.However, the reality is that many large model companies do not want to make large-scale investments in computing power. Several investors focused on AI told “Caijing” reporters that once startups begin to deploy computing power, two “problems” will arise: first, this investment has no upper limit or endpoint, and no one knows how much needs to be burned. OpenAI still experiences downtime due to insufficient computing power. Second, the company will become a heavy asset company, which negatively impacts the company’s future valuation and directly affects investors’ returns.In 2023, many investors directly advised large model entrepreneurs to first recruit some people from prestigious schools, quickly hold press conferences, launch large model products, and then pursue the next round of financing without buying computing power.During the boom, startups secured large amounts of financing, hired high salaries, and publicly launched products, driving up valuations. Once the boom passes, continued financing or going public requires revenue, at which point they will use the previously raised funds to bid on projects at low prices or even at a loss, or directly invest externally to consolidate revenues.

This could lead to a vicious cycle: reluctance to bear the high investment risks of computing power makes it difficult to achieve breakthrough developments in large models, thus making it challenging to compete with the giants that have truly invested heavily in this direction.

Data: How to Resolve Low-Quality Data?Data and computing power are both foundations of large models. In terms of data, the Chinese large model industry faces similar problems to computing power: is it worth investing heavily?In China, the threshold for data acquisition is generally low. In the past, web scraping tools were mainly used to collect data, but now open-source datasets can be used directly. Chinese large models are primarily based on Chinese data, and it is widely believed that the quality of Chinese internet data is low.An AI company founder described that when he needs to search for professional information on the internet, he uses Google or YouTube. Domestic websites or apps do not lack professional information, but there is too much advertising content, making it take longer to find professional content.OpenAI’s training data for large models also comes from Chinese internet platforms, but they have done a lot of extra work to improve data quality, which cannot be accomplished through ordinary data labeling work and requires specialized teams for data cleaning and organization.Previously, AI entrepreneurs have stated that it is difficult to find relatively standardized data service providers in China, most of which are customized services, and customized services are expensive.This is somewhat similar to the logic of whether to invest heavily in computing power; this investment does not seem cost-effective for many companies, especially startups. If they invest heavily and the final model performance is unsatisfactory, it is also “going to waste”; they might as well train using open-source data and directly hold a press conference.In addition, the Chinese market lacks effective data protection measures.One AI executive from a large company said, “In China, the data you can access, others can access too,” “If you spend a lot of money to create high-quality data, others can obtain it at a very low cost, and vice versa.”Including data processing, the intermediate links in large models will be a relatively clear new development direction in 2024. Regardless of the type of model, when it is implemented in specific application scenarios, it must be optimized and debugged with professional data, which raises the requirements for data processing. Moreover, model debugging and engineering optimization must also be involved.

But if any of these links become the “new windfall” in the eyes of investors, that would be another story.

Capital: Is It Only Short-Sighted Capital?The above three problems point to a common direction: capital short-sightedness.Despite OpenAI having charted a clear path, for the vast majority of companies, starting from scratch to mature large models will not significantly shorten the costs and time required.For most investors, the purpose of each investment is clear: exit and make money.OpenAI has become popular, with valuations skyrocketing and expected to continue growing. In April 2023, the company’s valuation was about $28 billion, and by December 2023, U.S. media reported that OpenAI’s latest valuation may exceed $100 billion. This is a very clear signal to investors: if they invest in Chinese large model startups at the right price, they can achieve exponential growth in valuation in a very short time.Chinese investors’ patience is only three to five years, which is determined by the capital operation model.Investors need to exit and obtain substantial returns within a certain time frame after fundraising from LPs. Exit channels for investors include project mergers and acquisitions, IPOs, or selling their shares to new investors in subsequent financing rounds.Early financing can rely on the windfall and storytelling, but in the mid to late stages or even IPOs, there must be a certain scale of commercialization ability. Investors have found that the longer it drags on, the harder it becomes for the project to go public or be acquired because the main business model in the AI field is to execute B-end customized projects, which determines that startups find it difficult to achieve high revenue growth. Investors can only take advantage of the windfall and quickly push the company to complete multiple rounds of financing, raising valuations, and then even if they sell their shares at a discount, it is still profitable.This is also why, in 2023, there have been numerous press conferences related to large models, and various large model rankings are blooming with different rankings, all of which are stories that help with financing. A similar path has occurred in the AI industry a few years ago, represented by the “four dragons of AI.” The large model startups in 2023 are just accelerating the path that has been completed over the past three years in one year.But short-sightedness is not solely the problem of investors. In today’s business environment, most people pursue short-term, certain results, and it seems difficult to grasp the future ten or even five years ahead.Commercialization: Who Are the Suitable Payers?In 2023, China’s large model industry rapidly transitioned from competing over large model parameters to competing over commercialization. At the CES (Consumer Electronics Show) in January 2024, two renowned AI scientists, Fei-Fei Li and Andrew Ng, both indicated that AI commercialization will see significant development and will penetrate more industries.Currently, it seems that there are two main application directions for large models:First, using large model technology to provide new tools for C-end users, such as the paid version of GPT-4, Baidu’s Wenxin large model restructured Baidu Wenku, new AI video editing tools, text-to-image tools, etc. However, it is difficult for C-end paid users to see large-scale growth in the short term, as the number of people with a strong demand for large model tools is relatively small.The more promising direction for commercialization is B-end services.In the Chinese market, providing B-end software services has always been a challenging business. Many investors and industry insiders have mentioned that the largest B-end clients in the Chinese market are government and state-owned enterprises. Large models, as advanced productivity tools, will have a direct impact of reducing manpower. However, in many cases, reducing manpower becomes a resistance in government and state-owned enterprises.If we take a step back and choose small and medium B clients, it will likely also be difficult in 2024. An AI large model entrepreneur stated that he recently inquired with several enterprise clients, and the responses were: “What can large models do? Can they help me cut jobs or make me money?”Even today, the most advanced large models still have “hallucination” issues. This can be tolerated in C-end applications, but in some professional B-end scenarios, having “hallucinations” means it is difficult to truly implement. In the past, comparative AI, such as facial recognition, if the recognition is wrong, the cost of manual assistance and adjustment is low. However, large models are good at “speaking nonsense authoritatively,” which can be somewhat misleading.However, large models have already been practically applied. Many industry insiders have mentioned that due to the emergence of large models, many previously unsolvable problems now have new methods to solve them, and efficiency has significantly improved. For example, the previously mentioned assembled large models, which were rarely attempted in the past, are now being adopted by many AI companies to combine multiple small models from different scenarios, solving most similar problems without needing to train models separately, allowing for direct retrieval and use.Additionally, in companies with large business operations, large models have already been implemented. Similar to how the previous round of AI vision technology drove the development of AI algorithms, these AI algorithms have quickly played important roles in content recommendation, e-commerce, ride-hailing, and food delivery sectors. Now, Tencent’s gaming business, Alibaba’s e-commerce business, ByteDance’s content business, etc., have all adopted large models.In 2024, there will be several relatively certain trends in the development of AI large models:First, the heat of financing will decline; the situation in 2023 where a company completes multiple rounds of financing of hundreds of millions of dollars will significantly decrease. Large model startups need to find new paths. Currently, it seems that large companies are more capable of working on large model infrastructure, while startups can consider adjusting their direction to fill the gap between foundational large models and applications.Second, the application of large models will continue to deepen, but this will mainly focus on areas with high digitalization and very large business volumes. In the C-end, large models will further spread, but for Chinese companies, they cannot solely rely on C-end user payments; other monetization models will be integrated into C-end application scenarios, mainly advertising.Third, domestic computing power will receive further attention; receiving attention does not mean there will be significant progress in the short term; this is a long process. As domestic computing power capabilities improve, there will be more opportunities for speculation, hype, and money-making phenomena.

The boom will stimulate rapid industry expansion, and bubbles will emerge as a result. The greater the opportunity, the larger the bubble. Only by looking beyond the bubble can we see the new opportunities for industrial development.

Five Real Issues in China's Large Model Industry

Five Real Issues in China’s Large Model Industry

Leave a Comment Cancel reply