“World’s Largest AI Chip – Cerebras WSE 3”
WSE 3: Integrating 40 Trillion Transistors
Cerebras has launched their chip: WSE3. This chip integrates 40 trillion transistors and uses TSMC’s 5-nanometer process. The emergence of this series of chips is challenging Moore’s Law in the semiconductor industry.
Cerebras Challenges Moore’s Law
Since the introduction of microchips in 1972, the semiconductor industry has followed Moore’s Law, where the number of transistors on a chip roughly doubles every two years, but Cerebras seems to have surpassed this semiconductor law.
Over 60 Times Larger than Nvidia H100
One reason for Cerebras’s success in recent years is their unique approach. A silicon wafer typically accommodates many chips; AMD, Nvidia, and Intel usually cut a 12-inch wafer into more than 60 GPUs, while Cerebras uses the entire wafer to create one massive chip. An intuitive comparison between the Cerebras chip and the Nvidia H100 GPU shows just how large this chip is.
AI Development Brings Opportunities for Large Chips
Developing larger GPU chips is a good idea for today’s AI workloads. Because the scale of AI training and inference is very large now, many GPUs work together to serve a single AI task. Interconnecting them and distributing the load is a complex and costly task. However, with one giant chip, you can significantly reduce the required costs and complexity.
Contains 900,000 AI Cores
The latest Cerebras chip features 900,000 AI cores and 44GB of memory. Moreover, this is on-chip memory, which is interwoven with the computing cores, reducing computational power consumption. This is another architectural difference compared to Nvidia and AMD GPUs, which have eliminated on-chip memory.
Supports Next-Generation Large Language Models
This new AI chip is designed to train the next generation of giant large language models, with parameter sizes up to 240 trillion, which is ten times larger than Open AI’s GPT-4 and Google’s Gemini.
Next Step: Building a Super AI Supercomputer
The next step is to connect 2048 such chips together to build an AI supercomputer. This computer will be capable of achieving a performance of a quarter ZFlop (10^21). This computer can train a large language model with 70 billion parameters from scratch in one day, whereas previously, this training process would take at least a month.
The Biggest Challenge: Yield Rate
Larger chips face greater yield challenges, especially for processes below 10 nanometers. Because transistors are very fragile and extremely small, even a single particle or a speck of dust falling on the chip or a single defect within the chip can render it useless. For chips as large as Cerebras, achieving 100% yield is nearly impossible, which would be a disaster for them.
Redundant Design + Special Software Design to Bypass Manufacturing Defects
Cerebras has found a solution. They have designed specialized software that can bypass defective AI cores whenever defects occur, replacing them with redundant or so-called spare cores. This way, the entire chip can always maintain a configuration of 900,000 AI cores without wasting the entire wafer.