Stop the AI Madness: Precompute Once, Save Millions

By Erik LundqvistJune 12, 2026

AI agents are wasting resources by recomputing the same data repeatedly. A simple solution could save millions by precomputing once and reusing it.

AI agents around the globe are caught in a ridiculous cycle. They're wasting precious resources by recalculating the same data over and over. Each one goes through the compute-heavy process of prefill, only to duplicate a key-value (KV) cache identical to the one just created by another. It's like a bad rerun that never ends.

The Simple Solution

Here's a refreshingly simple fix: compute it once. Let publishers precompute a document's KV cache and sell access to it. Every AI agent would just load it up, skip the prefill, and get on with its job. And guess what? It actually works. It's token-exact, no loss in accuracy, and it saves a ton of compute power.

Take the Qwen3-4B model, for example. Reuse here's anywhere from 9 to 50 times cheaper than prefill. The longer the document, the better the savings, since prefill's attention scales quadratically with length. Once you use it, the savings are clear.

The Real Challenge

The real challenge? Where to store this KV cache. Shipping it's a fail because the KV is almost impossible to compress. The cost of per-load egress ends up exceeding any savings from skipping prefill. But if you host it provider-side, just like with production prompt-caching, you cut out egress costs entirely. It’s a no-brainer.

Need numbers? Here's one to chew on: serving a single 3774-token document to 80 million agents would cost around $1.5 million to re-prefill. With reuse? Just $0.03 million. That's 49.7 times less. The current API cache-read tariffs grant a 10x user discount, just a fraction of the ~50x compute saving potential. Providers pocket the rest.

Why It Matters

So why should you care? Because this isn't just a technical fix, it's a potential goldmine. Millions in savings per popular document aren't just theoretical. They're up for grabs. And yet, until this model becomes mainstream, AI agents will keep wasting resources like money grows on trees.

Why are we still stuck on repeat when the solution is staring us in the face? Show me the product, show me the savings, and show me why we're not already doing this. Because, frankly, the status quo is absurd.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

Stop the AI Madness: Precompute Once, Save Millions

The Simple Solution

The Real Challenge

Why It Matters

Key Terms Explained