Why Shrinking Prompts Won't Shrink Emissions
Compressing prompts to cut emissions isn't the silver bullet for AI's carbon footprint. Some models actually burn more energy when squeezed.
It's a catch-22 of our time. Large Language Models (LLMs) promised to help solve climate challenges. But they're also ballooning global carbon emissions. A recent study tested whether compressing prompts could make LLMs more energy-efficient. The results? A mixed bag.
Compression Can't Cut It
The study looked at 28,421 API trials using three models: OpenAI's GPT-4o-mini, Anthropic's Claude-3.5-Sonnet, and DeepSeek-Chat. They wanted to see how compression ratios of 1.0, 0.7, 0.5, and 0.3 would affect energy use. Spoiler: it wasn't pretty. The pass rate plummeted from 26% at baseline to just 1.5% at a 0.7 ratio.
DeepSeek-Chat was the wild card. Under compression, its outputs expanded from 21 to a whopping 798 tokens at a 0.3 ratio. This led to an energy spike of 2,140%. GPT-4o-mini showed mixed results, even reducing energy use at a 0.5 ratio. So, if you're thinking of saving the planet by shrinking prompts, think again.
Model Choice Over Compression
What's the takeaway? If you're in the business of LLMs, you'd better focus on picking the right model and controlling output length. These strategies showed more reliable tradeoffs between energy use and quality. Why wrestle with input-token reduction when it barely makes a dent?
And here's a question: do we really expect squeezing language models to solve their carbon conundrum? If nobody would play it without the model, the model won't save it. The game comes first. The economy comes second. Retention curves don't lie.
It's time to shift focus. Rather than trying to compress prompts down to nothing, the industry should refine model efficiency and keep output in check. Until then, the dream of an eco-friendly AI remains just that, a dream.
Get AI news in your inbox
Daily digest of what matters in AI.