Level 4 · Expert · 9 min

Mixture of Experts

Architecture that routes tokens to a sparse subset of specialist subnetworks. Larger total parameters, lower active cost per token.