AI Models: When the Scaffold Matters More Than the Model

AI, we often focus on the thrilling prospect of ever-improving models. But what if the tools we use to support these models play an even greater role in their performance? Recent findings suggest that the scaffolds, or frameworks, around AI models can dramatically affect their effectiveness. In fact, these supporting structures can shift a model's measured accuracy by a whopping 28 percentage points. That's not a typo.

The Impact of Scaffolds

A recent study set out to explore just how much these scaffolds impact AI performance. By comparing three different frameworks, ReAct, a Planner-Actor-Rater multi-agent design, and a planner-then-executor approach, across five models from three providers, the results challenge some long-held assumptions. We're talking about big names here: Claude Opus 4.7, Sonnet 4.6, Haiku 4.5, Gemini 3.1 Pro Preview, and GPT-5.5.

What's fascinating? The study found that scaffold variations can lead to gaps of at least 10 percentage points in accuracy. Even more intriguing, the hypothesis that more advanced models are less affected by their scaffolds was turned on its head. More capable models, like the Anthropic range, actually gained the most from structured scaffolds when faced with tougher tasks.

Model Family Over Capability Tier

The study also revealed that the multi-agent advantage was specific to models within the Anthropic family, not the cross-provider models. It turns out that the conditioning variable isn't the capability tier of the model, but rather the model family itself. This throws a wrench into the idea that higher-tier models automatically outperform when faced with complex tasks.

the expected edge of the planner-executor setup on file-reading tasks fell flat. Instead, it was the structured scaffolds that made fewer mistakes and recovered better from mid-trajectory errors, especially at the more challenging levels.

Rethinking AI Progress

So, what's the takeaway here? Single-scaffold capability scores are conditional estimates, dependent heavily on the framework used. As models advance, there's no guarantee that the gap between what they can do and what they actually achieve will close.

It's a clear reminder that while AI models are advancing, the frameworks that support them are just as key in determining their real-world effectiveness. Should companies investing in AI focus more on scaffolding than ever before? Probably.

The real story here's that AI's progress isn't just about building smarter models. It's about ensuring those models have the right framework to truly shine. The gap between the keynote and the cubicle is enormous. Perhaps now, more than ever, it's time to bridge it.