DeepSeek-V2 Technical Interpretation

DeepSeek-V2 Technical Interpretation

DeepSeek has introduced a new MoE model, DeepSeek-V2, with a total parameter count of 236 billion and 21 billion active parameters. Although it is still a bit short of GPT-4 levels, it can be considered the strongest open-source MoE model available. Staying true to its open-source spirit, the accompanying technical report is also packed with … Read more

Deepseek-V2 Technical Report Analysis

Deepseek-V2 Technical Report Analysis

Deepseek has recently released the v2 version of its model, continuing the technical route of the Deepseek-MoE (Mixture of Experts) model released in January. It employs a large number of small parameter experts for modeling and incorporates more optimizations in training and inference. True to its tradition, Deepseek has fully open-sourced the model (base and … Read more