Revolutionizing Cache Compaction: Meet Still, the Long-Horizon big deal
The Still model introduces a groundbreaking approach to cache compaction, offering impressive performance in long-horizon language models. By balancing speed and quality, it challenges existing compaction methods.
Long-horizon language models face a persistent hurdle: the KV cache memory bottleneck. Existing methods either fall short in speed or compromise on expressiveness. Enter Still, a model that promises to change the game entirely.
The Still Advantage
Still functions as a compact per-layer Perceiver. It’s trained once against a frozen base model and produces concise keys and values in a single forward pass. On models like Qwen and Gemma, it shines on the speed-quality frontier, supporting compression ratios from 8x to 200x and context lengths from 8k to 128k.
What’s notable here? Still isn’t just a stopgap. It exceeds the strongest baseline by 8 to 22 points on the RULER grid. That’s not just incremental improvement, it’s a leap forward.
Why This Matters
language models, maintaining context is important. The longer the context, the more coherent and relevant the output. Still manages to preserve most of the full-context gain on tasks like HELMET, even besting KV-Distill in a pairwise LongBench summarization.
Here’s the kicker: Still’s compaction is achieved through a forward pass. It’s iterative, enabling long-horizon performance that per-context methods simply can’t reach. In essence, Still transforms what seemed like a distant dream into a tangible reality.
The Broader Implications
Strip away the marketing and you get an approach that tackles a longstanding problem. The architecture matters more than the parameter count, and Still’s design proves it. By making long-context cache compaction practical, it opens doors for more efficient and effective AI applications.
Should existing compaction methods feel threatened? Frankly, yes. The numbers tell a different story about what’s possible, and Still is paving the way for future advancements. As models grow, the need for innovative solutions like Still will only intensify.
Will this be the end of KV cache woes? If Still’s trajectory continues, it very well might be. But as always, the true test lies in its real-world deployment. Time will tell if it can match its promise in diverse environments.
Get AI news in your inbox
Daily digest of what matters in AI.