Abstract: With the widespread popularity of Large Language Models (LLMs), the mixture of experts (MoE) has not only emerged as a key enabler for scaling up model capacity by significantly reducing ...