The Future of AI Chips in the Era of Generative AI

The AI market is far from mature; the current market competition is still quite early, and it is uncertain who will prevail. In fact, before ChatGPT swept the globe, AI chip startups around the world had already taken root like bamboo shoots after a rain. In the era of AI, where Nvidia continues to hold the high ground, where has the AI chip market that drives generative AI gone?

At this year’s Nvidia GTC analyst communication meeting, Nvidia CEO Jensen Huang raised an interesting point: if everything can be digitized, then everything can be tokenized—tokens are a common term in the field of generative AI, referring to the unit of information that AI models read and generate, such as a word.

In addition to text and images, “we have actually digitized many things, including proteins, genes, brainwaves, etc.—as long as we understand their structure or can abstract specific patterns from them and comprehend their connotations, we can digitize them,” Huang said. “Then perhaps we can generate them. The generated tokens can be in chemistry, healthcare, animation, robotics, or 3D graphics. If we can generate the next token of text, we can generate the next token of images, videos, or robotic arms.”

The essence of this statement is clear: generative AI has a broad future and penetrates various industries. The first wave of Nvidia’s entry into the trillion-dollar club was driven by AI, and the second wave was driven by generative AI. This can be seen from Nvidia’s stock price’s two watershed moments. The timing of Huang personally handing the DGX-1 to OpenAI is quite significant…

The so-called first wave was when AlexNet first ran on Nvidia’s gaming graphics cards in 2012, leading people to discover the potential of GPUs for AI; the second wave of generative AI, ignited by ChatGPT, is characterized not only by the fact that OpenAI, behind GPT, uses Nvidia GPUs, but also that before ChatGPT was released, Nvidia had repeatedly emphasized the importance of Transformers and specifically added a Transformer engine to the Hopper architecture—almost precisely hitting the market explosion point six months later (Figure 1).

This has once again solidified Nvidia’s position in the HPC AI field, and a trillion-dollar market valuation is a natural outcome, with quarterly revenue skyrocketing in the triple digits. While writing this article, we interviewed Liu Jianwei, co-founder and vice president of AiChip, who specifically mentioned: “Generative AI is a cost-driven productivity revolution that will reshape thousands of industries.”

The Future of AI Chips in the Era of Generative AI

Figure 1: Generative AI has become a new hotspot pursued by AI chip companies.

However, for generative AI, almost all market participants, including Nvidia itself, believe that this technology is still in its early development stage—especially in the enterprise AI market. Recently, at the Intel Vision event, Intel provided data showing that only 10% of enterprises had deployed and applied generative AI technology in production as of 2023. IDC predicts that spending on generative AI by global enterprises will grow approximately 3.8 times in the next three years. This indicates that there is still considerable room for market development in generative AI.

In other words, the AI market is far from mature; the current market competition is still quite early, and it is uncertain who will prevail. In fact, before ChatGPT swept the globe, AI chip startups around the world had already taken root like bamboo shoots after a rain. In the era of AI, where Nvidia continues to hold the high ground, where has the AI chip market that drives generative AI gone?

Nvidia’s Strengths and Weaknesses

Since 2019, Electronic Engineering Magazine and International Electronic Commerce have mentioned multiple times in various articles that Nvidia is very strong in the era of AI. Some mentioned stock prices, some mentioned revenue, and some mentioned market share in data center AI HPC. These numerical trends have not changed much up to now.

The real question is, where exactly is Nvidia “strong”? How were these figures achieved? We have summarized several key points.

First, of course, is the underlying hardware, but it’s not just as simple as GPUs. The ability of chips to be based on leading architectures, leading manufacturing processes, and leading packaging technologies is the foundation for being considered “strong” in the battlefield. However, this is not everything in the era of generative AI. More than a decade ago, Nvidia’s chief scientist Bill Dally discussed networking technology issues aimed at HPC with Huang. Huang asked him: “Why should we do networking? Aren’t we a GPU development company?”

Although Huang later fully supported the development of the corresponding technology, his question at that time was reasonable. This question may extend to why Nvidia acquired Mellanox, why it developed DPU, why it made switching chips and switches, and why it researched optical communication technology between packages… Isn’t Nvidia “a GPU development company?”

This year’s GTC saw Nvidia release the GB200 NVL72, which is quite representative. Nvidia claims that Blackwell’s inference capability has achieved a 30-fold improvement compared to the previous generation Hopper. This 30-fold improvement, of course, is not at the chip level; Moore’s Law, surpassing Moore, or any Moore cannot achieve a 30-fold performance improvement across generations. The enhancement of support for specific data formats, and more importantly, the upgrade of NVLINK interconnect technology and the introduction of NVSwitch chips, are key to the entire system achieving a 30-fold performance improvement in multimodal model inference.

All of these are “underlying hardware,” but they have expanded to the system level, where interconnect, storage, and heat dissipation are critical. Solving the bottleneck of cross-node communication is evidently one of the most critical components in the era of generative AI. These elements belong to Nvidia’s “ecosystem”; they are also barriers that many competitors, who may criticize Nvidia on PPT, find it difficult to overcome.

Secondly, there is software, or rather, an ecosystem based on CUDA that has a sufficient overview of other competitors. A few years ago, when generative AI was not as popular, Nvidia’s promotional platforms included NVIDIA AI, NVIDIA HPC, and NVIDIA Omniverse (as shown in Figure 2). In the past two years, the promotional efforts for the latter two have noticeably weakened.

Figure 2: Nvidia’s three major platforms.

In fact, even without discussing middleware and libraries like CUDA and TensorRT, which are several orders of magnitude more efficient and complete than competitors, at the application layer—pre-trained models, synthetic data generation, and AI deployment services—Nvidia has also personally written a bunch of commonly used application frameworks. This so-called “end-to-end” construction of upper and lower layers is something that current competitors find it difficult to match.

This year at GTC, Nvidia released a microservice called NIM (NVIDIA Inference Microservice)—simply put, it is a generative AI service packaged for enterprises, allowing them to choose models, fine-tune models, and implement generative AI that can be used by enterprises. In other words, NIM is a tool for Nvidia to simplify the use of generative AI in enterprises or products.

Nvidia executives said that RAG is an experimental technology for enterprises, a PoC for applying AI technology, and now is the time to seriously implement enterprise AI; NIM is prepared for this. Regardless of whether this judgment is correct, it is enough to indicate that Nvidia is three steps ahead in the enterprise AI market—its competitor Intel has only just begun to emphasize the importance of RAG.

So, is Nvidia unbeatable in today’s enterprise AI market? Nvidia’s fatal weakness may lie in the closed nature of its ecosystem. Saying that Nvidia’s AI ecosystem is closed might be a bit extreme, as Nvidia is also contributing to open-source software. However, from a high-level abstract perspective, the ecosystem constructed by Nvidia is relatively closed, or at least should be described as proprietary and exclusive.

Open vs Closed

Nvidia’s strength has indeed influenced many competitors in recent years. For example, Graphcore, which was once highly sought after, has sought to sell itself as of February this year. Before Graphcore withdrew from the Chinese market, we learned from an interview that this AI training chip company aimed at several key application markets, not seeking to directly compete with Nvidia as a development strategy.

Some high-performance AI chip companies have emphasized avoiding competition in AI training since their inception, instead focusing on the AI inference track. In recent years, the introduction of geopolitical issues has made the competitive landscape even more complex.

However, one typical representative of Nvidia’s challengers is Intel. Although Intel’s market share in the data center CPU sector has significantly declined in the past two years, it still holds the top position. Watching the general-purpose processor market value being eroded by accelerators, Intel’s XPU strategy has become a necessity: thus, Intel’s GPUs and AI acceleration cards have been launched, along with hardware like IPUs.

Intel’s external communications now emphasize that, on one hand, CPU acceleration based on extended instruction sets can meet the needs of many AI applications, allowing enterprises to apply AI technology without needing to “renovate” their existing infrastructure (greenfield investment) at controllable costs; on the other hand, they also have GPUs and AI chips.

In our view, Intel’s dilemma in this situation lies in both entering the market too late and being unable to escape the issues faced by all competitors: Nvidia’s hardware and software ecosystem advantage. They now have chips, boards, and cooperative OEM manufacturers, but must they build a complete proprietary ecosystem from the ground up like Nvidia? The answer is certainly no.

Figure 3: Intel’s enterprise AI ecosystem stack.

Since the beginning of the past two years, Intel has formally promoted to the media how much it has invested in software: oneAPI and OpenVINO, two software packages that are similar in role and level to Nvidia’s CUDA and TensorRT, have also been widely reported.

As essential software foundations for developing AI applications, these two software packages led by Intel are indispensable. However, aside from discussing the completeness and maturity of the software, the biggest difference between them and Nvidia lies in their emphasis on “openness.” The most typical case is that oneAPI can even support Nvidia GPUs, while OpenVINO, as an inference engine, can support Arm platforms through plugins.

Moreover, software ecology is just one aspect. In Intel’s enterprise AI full-stack structure, apart from the underlying chips, there is not much to say; in terms of infrastructure (OEM, CSP, and OSV roles), software, and application ecosystems, Intel emphasizes “openness” (Figure 3).

For example, regarding cross-node interconnection, Intel expresses the need to continuously build and strengthen Ethernet standards and ecosystems rather than relying on proprietary Infiniband; the interconnection between chip packaging and dies should also ensure open standards for the industry; even the IPUs launched to the market will not only support Intel processors but also third-party solutions; even the reference platforms or packaged solutions Intel offers to enterprises will have open components…

Upon careful consideration, in the full-stack structure described by Intel, Nvidia, as a chip manufacturer, is involved in almost every level, serving multiple roles as OEM, CSP, etc., and has done a lot in the model and application layers. Intel’s approach is completely opposite or relative to this.

This year’s Intel Vision event saw Intel emphasize “openness,” “choice,” and the need to “not be locked into a single vendor” throughout the event. It is said that many of the initial members of Intel’s enterprise AI open platform are former competitors of Intel in various sectors. Clearly, in the era of generative AI, no one wants the market to be dominated by Nvidia alone.

We believe that the open strategy for enterprise AI promoted by market competitors represented by Intel is a necessity, a common path in historical competition across different fields. Especially for players like Intel, if they do not embrace openness and build ecosystems based on standards in cooperation with partners, they will inevitably fall into defeat.

From the perspective of enterprise customers, considering practical issues such as costs, existing infrastructure, business needs, and not being bound by a closed ecosystem, the “open ecosystem stack” is undoubtedly an important choice. Even if this analogy may not be entirely accurate, the smartphone market’s iOS and Android ecosystem pattern can provide insight—this is a high-level abstraction and a possibility repeatedly validated in history.

Generative AI Will Eventually Reach the Edge

As generative AI gradually becomes widespread across various industries, the computational power and energy distribution for the development of generative AI will see two trends in the future. First, there will be a continued shift towards inference. “In the era of generative AI, the primary task of AI in data centers will shift from training to inference,” Liu Jianwei stated.

A report released by Schneider Electric last year indicated that, from the perspective of AI load (power), in 2023, AI training accounted for 20% of the total load, while inference accounted for about 80%; by 2028, the training load will continue to decrease to 15%. “The demand for training is also increasing, but the application of generative AI inference is growing even faster.”

Secondly, AI and generative AI will also comprehensively move towards the edge and end-side to achieve the now-popular phrase “AI everywhere.” Schneider Electric’s report also indicated that in 2023, the central load accounted for 95%, while the edge accounted for 5%; it is expected that by 2028, this value will change to 50%:50%.

Regardless of whether this expectation is accurate, in the context of the overall market value continuously growing, the growth of the edge AI market will significantly outpace that of data center AI. Liu Jianwei also mentioned: “In the era of generative AI, the AI capabilities on the edge and end-side will become increasingly significant in the entire system.” The reasons for this have been discussed enough, including data security and privacy, model customizability and flexibility, and the different needs of specific industries, companies, and individuals, all of which cannot be met by cloud-based generative AI.

“For example, the end-side 2b model could change the way we interact with various machines, shifting from traditional, single command lines or graphical interfaces to multimodal interaction methods like natural language + graphical interface + gesture actions. Edge models of tens of b can be used to create personal or corporate knowledge bases, smart assistants, etc.” This reflects the value of personal and enterprise applications of generative AI at the current stage.

Intel has repeatedly emphasized the importance of edge AI in its AI philosophy over the past year—ultimately, this is because certain industries on the edge and PC-side are its advantageous markets. Nvidia has also discussed “the edge,” especially in the robotics and heavy industry fields, as well as AI PCs with high computing power demands.

However, in the broader AIoT field, the competitive space for generative AI is much larger and is certainly not something that Nvidia and Intel can control. The shift in value focus, the continuous development of industries, and the vast market that is not dominated by giants like Nvidia naturally lead everyone to hope to share in such a potential market.

At the IIC Shanghai event this year, we interviewed Dai Weijin, executive vice president and general manager of the IP division at Chipone, who specifically mentioned that Chipone “besides improving general large model technology,” is also very focused on how to transfer large model inference from data centers to edge devices. Chipone is currently experimenting with three types of edge devices: mobile phones, cars, and PCs, and in the future, they will “conduct in-depth research, possibly even to smaller devices.” This process requires innovation at the chip architecture, data, model, and other levels.

Thus, Nvidia has related research on Efficient AI, with model pruning and sparsity being routine; it is said that the latest research result AWQ (Activation-aware Weight Quantization) can reduce certain network weights to 2 bits. Intel, on the other hand, has been working for over a year to focus on moving generative AI models to lightweight AI PCs, such as BigDL-LLM, which Intel has repeatedly mentioned in its external communications.

Figure 4: AiChip AI chip.

“Using generative AI on the edge requires minimizing model size and reducing the demand on chip memory bandwidth; for the chip, it requires minimizing power consumption.” The statements of Liu Jianwei and Dai Weijin are similar, with AiChip always actively exploring more cost-effective edge-side generative AI solutions in collaboration with industry chain partners.

“Edge-side generative AI is bound to come,” Liu Jianwei stated. “The current stage of generative AI is still in the process of exploring boundaries upwards and continuously increasing model sizes; however, explorations to control model sizes and enhance model capabilities have already emerged.” We believe that AiChip’s approach and landing products in the edge generative AI direction are quite representative and valuable for reference.

Generative AI and traditional AI do not differ much at the operator instruction level; “the main innovation comes from model structure, which is the different ways of using instruction combinations and the growth of computing power demand.” Liu Jianwei mentioned AiChip, “AiChip’s AI processor is designed with operators as instructions, resulting in the AX650N, which was mass-produced in 2022, still being the best processor for running Transformer networks in the market, achieving an energy efficiency of 199FPS/W when running SwinT.”

“AiChip has already provided numerous application demos based on Transformer networks on Github.” “Customers have already implemented applications based on Transformer networks on the AX650N product, such as searching for images through text and achieving open-set detection.”

In conjunction with this, “designing AI processors requires close cooperation among AI applications, software toolchains, and hardware teams, with a reasonable division of labor, striving for a subtractive design approach. The technical architecture of AI processors needs to have a corresponding organizational structure to match it, so that AI processors can truly be implemented,” Liu Jianwei stated while discussing the concept of landing generative AI on the edge. “Today, the interface of AI processors and AI applications, the AI instruction set (operator set), is gradually converging, but has not completely converged. Therefore, a One Team with efficient evolution capabilities is needed to match it.”

This is said to be the key to “AiChip’s AI processor demonstrating high efficiency, high energy efficiency, mature and user-friendly software toolchains, rapid architecture iteration speed, and fast customer support response”; it should also be a lesson for edge AI chip companies. Notably, Liu Jianwei also mentioned that AiChip’s AI processor is currently “licensed to external parties,” helping the industry solve the NPU fragmentation problem, accelerate AI implementation, and promote inclusive AI. It seems that edge-focused AI chip companies are also trying to expand their own ecosystems nowadays.

As AiChip’s founder and chairman, Dr. Qiu Xiaoxin, said, “Using AI to disrupt traditional technology and redefine products, artificial intelligence is already present in various industries and continuously influencing the way society operates; ultimately, it is about using technology to make artificial intelligence benefit every corner of production and life, creating a better life with inclusive AI.”

Figure 5: Some say that in the era of generative AI, most information will be generated by AI, and humans will find it difficult to distinguish the truth of computer-generated information.

If ChatGPT set an example for the development of generative AI, then undoubtedly, in the coming years, generative AI will penetrate into thousands of industries and all aspects of people’s lives. At the beginning of the year, at the Intel Foundry event, Intel CEO Pat Gelsinger stated in a media interview that analysts generally expect the semiconductor industry to reach a market size of one trillion dollars by 2030, which is before the explosion of generative AI; clearly, the arrival of generative AI makes the trillion-dollar estimate very conservative. This indicates the untapped state of this market, which can be seen even in the semiconductor field.

IDC data shows that in 2024, global enterprises are expected to spend about 40 billion dollars on generative AI, which will grow to 151 billion by 2027. It is expected that by 2026, 80% of enterprises will use generative AI technology. At the same time, at least 50% of edge computing deployments will incorporate machine learning technology before 2026. By 2030, AI will become the largest load type in market value among edge applications.

This article primarily illustrates the current development status of generative AI by providing examples from three companies: Nvidia, Intel, and AiChip. These companies represent the inherent market leaders of cloud AI under the generative AI hotspot, traditional data center challengers, and market-leading participants as AI comprehensively moves towards the edge.

Their development strategies and thoughts reflect the current state of generative AI development in the chip field. Aside from the fact that generative AI is still in its early stages of development in both data centers and edge, we can see the eagerness of various market participants and their unique capabilities. After all, with such a large cake, who wouldn’t want to share a piece? More interesting battles are expected to intensify in the next five years.

Recommended Hot Articles

Shenzhen Autonomous Driving Ride-Hailing Cars Begin Highway Testing
Dissecting Earphone Cases, Overvoltage Protection Circuit Analysis
I Found the Meaning of Black Myth: Wukong in Disassembling the PS3!
Xiaomi Loses 60,000 on Each Car Sold? If You Don’t Buy, Aren’t You Saving Him Money?
Disassembling the IBM ThinkPad Optical Drive from Back Then, Quite Surprising

Leave a Comment Cancel reply