Level 4 · Expert · 9 min
Mixture of Experts
Architecture that routes tokens to a sparse subset of specialist subnetworks. Larger total parameters, lower active cost per token.
Architecture that routes tokens to a sparse subset of specialist subnetworks. Larger total parameters, lower active cost per token.