Revolutionizing Cache Compaction: Meet Still, the...

Long-horizon language models face a persistent hurdle: the KV cache memory bottleneck. Existing methods either fall short in speed or compromise on expressiveness. Enter Still, a model that promises to change the game entirely.

The Still Advantage

Still functions as a compact per-layer Perceiver. It’s trained once against a frozen base model and produces concise keys and values in a single forward pass. On models like Qwen and Gemma, it shines on the speed-quality frontier, supporting compression ratios from 8x to 200x and context lengths from 8k to 128k.

What’s notable here? Still isn’t just a stopgap. It exceeds the strongest baseline by 8 to 22 points on the RULER grid. That’s not just incremental improvement, it’s a leap forward.

Why This Matters

language models, maintaining context is important. The longer the context, the more coherent and relevant the output. Still manages to preserve most of the full-context gain on tasks like HELMET, even besting KV-Distill in a pairwise LongBench summarization.

Here’s the kicker: Still’s compaction is achieved through a forward pass. It’s iterative, enabling long-horizon performance that per-context methods simply can’t reach. In essence, Still transforms what seemed like a distant dream into a tangible reality.

The Broader Implications

Strip away the marketing and you get an approach that tackles a longstanding problem. The architecture matters more than the parameter count, and Still’s design proves it. By making long-context cache compaction practical, it opens doors for more efficient and effective AI applications.

Should existing compaction methods feel threatened? Frankly, yes. The numbers tell a different story about what’s possible, and Still is paving the way for future advancements. As models grow, the need for innovative solutions like Still will only intensify.

Will this be the end of KV cache woes? If Still’s trajectory continues, it very well might be. But as always, the true test lies in its real-world deployment. Time will tell if it can match its promise in diverse environments.

Revolutionizing Cache Compaction: Meet Still, the Long-Horizon big deal

The Still Advantage

Why This Matters

The Broader Implications

Key Terms Explained