Why AI Agents Struggle With Following Instructions

AI agents are designed with the promise of reducing the need for crafting tailored instructions for each task. Yet, in practice, these agents often require guidance through specific plans, such as resolving software bugs in structured phases. But do they actually follow the plans they're given? This important question remains largely unanswered, and understanding it's essential to evaluating how these agents solve problems.

Unpacking Agent Behavior

An extensive analysis of programming agents sheds light on this issue. The study examined 16,991 task trajectories from a programming agent called SWE-agent, tested across four different large language models (LLMs) using the SWE-bench Verified and SWE-bench Pro benchmarks. The agents operated under eight different plan variations. What emerged was a nuanced picture of how AI behaves when tasked with problem-solving.

Without explicit plans, these agents tend to rely on workflows internalized during their training. However, these workflows are often incomplete or inconsistently applied, leading to overfitting and inadequate problem resolution. Surprisingly, providing a standard plan improves outcomes, suggesting that a clear sequence of steps can steer agents towards better performance. But the nuances don't stop there.

The Plan Paradox

What stands out is that a poor-quality plan can actually impair performance more than having no plan at all. This is a revelation. Plans that misalign with the agent's inherent problem-solving strategies can derail their effectiveness, especially if augmented with extra phases that don't fit their logic. This finding turns conventional wisdom on its head, sometimes, less is more.

Why should this matter to you? AI is becoming integral to how businesses and industries operate. Understanding the limitations of these technologies helps set realistic expectations and drive improvements. Are we teaching agents to think adaptively, or are we merely encoding rigid instructions? There's a research gap here: teaching models to reason and adjust rather than memorize and regurgitate.

Looking Forward

The capex number is the real headline here. Investments should focus on developing AI systems that adapt to changing instructions, minimizing the reliance on memorized flows. As we move forward, fine-tuning models to follow guided plans while retaining flexibility could yield significant dividends.

The strategic bet is clearer than the street thinks, designing AI that can pivot and adapt might just be the key to unlocking their full potential.

Why AI Agents Struggle With Following Instructions

Unpacking Agent Behavior

The Plan Paradox

Looking Forward

Key Terms Explained