ReMoE: Saving Memory in Mixture-of-Experts Models

By Signe EriksenMay 27, 2026

ReMoE enhances Mixture-of-Experts model efficiency by improving token-wise expert reuse, reducing I/O overhead, and increasing throughput.

Fine-grained Mixture-of-Experts (MoE) models are valued for their ability to activate only a fraction of experts per token, optimizing computation while preserving model capacity. Yet, their effectiveness is hampered when memory is tight. Experts not stored in fast cache must be retrieved from slow external sources, resulting in frequent cache evictions and high I/O costs.

Introducing ReMoE

ReMoE, a novel router fine-tuning framework, addresses these bottlenecks by enhancing token-wise expert reuse. It nudges the router to favor experts recently employed, leading to stable routing that aligns better with cache constraints. In essence, ReMoE boosts short-term expert reuse, slashing the frequency of expert retrievals from storage without burdening inference-time computation.

Performance Gains

Empirical evaluations on the DeepSeek and Qwen models demonstrate ReMoE's prowess. The framework increases expert reuse by 26%, all while maintaining performance in downstream tasks. More importantly, real-system tests reveal an 8.4% boost in output throughput under vLLM GPU-CPU expert offloading. Moreover, it cuts TPOT by a staggering 43.6-49.8% when running on Jetson Orin NX with llama.cpp. This corresponds to a 1.77 to 1.99 times decode speedup spanning diverse workloads.

Why It Matters

ReMoE's impact can't be overstated. For researchers and developers grappling with memory constraints, it offers a clear path to optimize performance without additional computational demands. If you're looking to eke out more efficiency from MoE models, ReMoE might be the solution.

With its significant reductions in I/O overhead and increased throughput, the real question is: why aren't more teams adopting ReMoE? Checkpoints and usage guidelines are accessible at the project's GitHub repository, providing an easy entry point for those ready to take the leap.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.