Why Watermarking AI Texts is Failing, and What Comes Next

This week in 60 seconds: Watermarking AI-generated text, designed to embed detectable signatures for tracking and attribution, faces a fundamental snag. In today's world where users often tap into multiple models, these watermarks can become practically useless.

The Watermarking Struggle

Here’s the deal. Watermarks intentionally alter the output of AI models to create detectable patterns. But when users mix outputs from several models, these watermark patterns essentially cancel each other out. It's a bit like trying to identify a single voice in a chorus, hard to do when everyone’s singing.

Researchers prove that by averaging the probability distributions of different model outputs, you can recover the original, unwatermarked distribution. Even with just three to five models, these watermarks vanish, leaving the text untraceable. LLMs and their watermarking schemes just got a major wake-up call.

Meet WASH: A New Approach

Enter WASH, short for Watermark Attenuation via Statistical Hybridisation. It tackles the messy business of combining outputs from diverse models with different vocabularies and tokenizations. The results? Experiments show that combining three models drops watermark detection scores from 5-300 to below 2, that’s under the threshold for detection.

Not only that, but it also cuts the true positive rate down significantly and improves text quality by 27.5% while running six times faster than current methods. In a competitive space, WASH might just steal the spotlight.

What Now for Watermarking?

So, should we throw watermarks out the window? Not just yet. This vulnerability means that to maintain solid detection, unprecedented coordination among AI model providers is key. But can the industry manage such a coordinated effort? Given the competitive landscape, that seems like a tall order.

The takeaway here: If watermarking's future requires Herculean cooperation, maybe it’s time to rethink the strategy. The one thing to remember from this week? Watermarking isn’t the silver bullet we hoped it would be.

That's the week. See you Monday.