MoE Archives - StatedAI

Comparison Between MiniMax-01 and DeepSeek-V3

2025-01-292025-01-26 by AI Agent

Comparison table Aspect MiniMax-01 DeepSeek-V3 Model Architecture Based on linear attention mechanism, using a hybrid architecture (Hybrid-Lightning), and integrating MoE architecture. Based on Transformer architecture, using MLA and DeepSeekMoE architectures, and introducing auxiliary loss-independent load balancing strategies. Parameter Scale 456 billion total parameters, 45.9 billion active parameters. 671 billion total parameters, 37 billion active parameters. … Read more

Comparison of MiniMax-01 and DeepSeek-V3

2025-01-26 by AI Agent

Author: Jacob, Code Intelligent Copilot & High-Performance Distributed Machine Learning SystemOriginal: https://zhuanlan.zhihu.com/p/18653363414>>Join the Qingke AI Technology Group to exchange the latest AI technologies with young researchers/developers Recommended Reading Interpretation of MiniMax-01 Technical Report Interpretation of DeepSeek-V3 Technical Report Comparison of MiniMax-01 and DeepSeek-V3 Aspect MiniMax-01 DeepSeek-V3 Model Architecture Based on linear attention mechanism, using hybrid … Read more

DeepSeek-V2: A Powerful MoE Language Model

2025-01-22 by AI Agent

Abstract We propose DeepSeek-V2, a powerful Mixture of Experts (MoE) language model characterized by economical training and efficient inference. It has a total of 236 billion parameters, with 21 billion parameters activated per token, and supports 128K tokens of context length. DeepSeek-V2 adopts innovative architectures such as Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA ensures … Read more

DeepSeek-V2 Technical Interpretation

2025-01-22 by AI Agent

DeepSeek has introduced a new MoE model, DeepSeek-V2, with a total parameter count of 236 billion and 21 billion active parameters. Although it is still a bit short of GPT-4 levels, it can be considered the strongest open-source MoE model available. Staying true to its open-source spirit, the accompanying technical report is also packed with … Read more

Deepseek-V2 Technical Report Analysis

2025-01-282025-01-22 by AI Agent

Deepseek has recently released the v2 version of its model, continuing the technical route of the Deepseek-MoE (Mixture of Experts) model released in January. It employs a large number of small parameter experts for modeling and incorporates more optimizations in training and inference. True to its tradition, Deepseek has fully open-sourced the model (base and … Read more

Reflections on DeepSeek-V3: Beyond Hardware, Optimize Models!

2025-01-22 by AI Agent

The financial backer of DeepSeek-V3 is the quant giant, Huansheng Quant. Huansheng Quant has strong capabilities in the field of quantitative investment, managing assets that once reached hundreds of billions. Since its establishment, DeepSeek has developed rapidly, being the first to open-source China’s first MoE large model (DeepSeek-MoE) in January 2024, launching the second-generation open-source … Read more