What Is Mixture of Experts (MoE)?
Mixture of Experts (MoE) is a neural network architecture that divides the model into multiple specialized sub-networks (experts) and uses a gating mechanism (router) to selectively activate only the most relevant experts for each input, enabling massive model capacity with efficient computation.
How Mixture of Experts (MoE) Works
In a standard dense model, every parameter is used for every input. MoE models instead contain many expert sub-networks, but only a subset is activated per input token — typically 2 out of 8 or 16 experts. A learned router determines which experts to use based on the input. This means a model can have trillions of total parameters but only use a fraction during each computation, dramatically improving efficiency. GPT-4 is widely reported to use a MoE architecture, and Mistral's Mixtral model openly uses it. MoE enables building larger, more capable models that run at speeds comparable to much smaller dense models.
Real-World Examples
Mixtral 8x7B using 8 expert networks but only activating 2 per token, achieving GPT-3.5 level performance at faster speeds
GPT-4 reportedly using a MoE architecture with multiple specialized expert networks for different types of knowledge
A multilingual model using different experts for different language families, activating Spanish experts for Spanish text
Mixture of Experts (MoE) on Vincony
Vincony provides access to MoE-based models like Mixtral alongside dense models, letting users compare their speed and quality for different tasks.
Try Vincony free →