Exploring Throughput, Latency, and Cost Space of LLM Inference

Exploring Throughput, Latency, and Cost Space of LLM Inference

Selecting the right LLM inference stack means choosing the right model for your task and running appropriate inference code on suitable hardware. This article introduces popular LLM inference stacks and setups, detailing their cost composition for inference; it also discusses current open-source models and how to make the most of them, while addressing features that … Read more

Qwen1.5-MoE Open Source! Best Practices for Inference Training

Qwen1.5-MoE Open Source! Best Practices for Inference Training

01 Introduction The Tongyi Qianwen team has launched the first MoE model in the Qwen series, Qwen1.5-MoE-A2.7B. It has only 2.7 billion activated parameters, but its performance can rival that of current state-of-the-art models with 7 billion parameters, such as Mistral 7B and Qwen1.5-7B. Compared to Qwen1.5-7B, which contains 6.5 billion Non-Embedding parameters, Qwen1.5-MoE-A2.7B has … Read more

Interpretation of Technical Specifications for Large Model Inference Platforms

Interpretation of Technical Specifications for Large Model Inference Platforms

With the rapid development of large model technology, its application scope has widely penetrated various aspects of enterprise R&D applications, production, and management. Due to the large number of parameters in large models and their complex and diverse deployment scenarios and forms, higher requirements have been put forward for the deployment, inference, and service aspects … Read more

Cerebras Unveiled: The Giant AI Chip Challenging GPUs

Cerebras Unveiled: The Giant AI Chip Challenging GPUs

👇 Follow our official account for the latest AI updates🌟 This article is based on an interview with Joel Hestness by Dr. Waku on his YouTube channel, published on December 25, 2024. Original content reference: https://www.youtube.com/watch?v=qC_lCFTOJU0 Summary: Joel Hestness on How Cerebras’ Giant Chip Challenges NVIDIA’s GPU Dominance in AI This article focuses on Cerebras … Read more

Components of Expert Systems

Components of Expert Systems

The structure of an expert system varies slightly depending on the application field and problem type, but generally, the typical structure of an expert system is shown in the figure below: Alright, let me describe the structure of an expert system. An expert system typically consists of the following components: Knowledge Base: The knowledge base … Read more

Understanding Expert Systems: Concepts and Structures

Understanding Expert Systems: Concepts and Structures

Quoted from: “Industrial Artificial Intelligence” (Authors: Cai Hongxia, Zhou Chuanhong) The book has been published, for details see the end of the article~ 「 1. Concept of Expert Systems 」 1) Definition of Expert Systems An expert system is a knowledge-based system used to apply the years of accumulated experience and expertise of domain experts … Read more

Generative AI Inference Technology, Market, and Future

Generative AI Inference Technology, Market, and Future

OpenAI o1, QwQ-32B-Preview,DeepSeek R1-Lite-Preview’s successive release signifies that generative AI research is shifting from pre-training to inference to enhance AI logical reasoning capabilities. This transition will greatly promote the development of upper-layer applications.Sequoia Capital recently pointed out, that in the foreseeable future, logical reasoning and computation during inference will be an important theme, ushering in … Read more