Getting Started with Mistral: An Introduction
Getting Started with Mistral: An Introduction The open-source Mixtral 8x7B model launched by Mistral adopts a “Mixture of Experts” (MoE) architecture. Unlike traditional Transformers, the MoE model incorporates multiple expert feedforward networks (this model has 8), and during inference, a gating network is responsible for selecting two experts to work. This setup allows MoE to … Read more