Why LLM Agents Are Only As Good As Their Feeds
Evaluations of LLM agents often ignore the impact of upstream ranking systems on decision-making. New research highlights critical feed curation effects.
large language models (LLMs), what an agent sees before making a decision can be just as important as the decision itself. A new study dives into this often-overlooked element, showing how feed curation can dramatically influence LLM outcomes.
The Hidden Power of Feed Curation
LLM agents are increasingly acting based on ranked input streams like social feeds and search results. However, safety evaluations traditionally focus on the model or user prompt alone, neglecting the critical role of the upstream ranker. A recent protocol tested this by keeping the model and decision prompt constant, while varying only the order and content of information presented in a ten-turn scrolling phase.
After analyzing 2,785 decision rollouts across four modern LLMs from three independent labs, three distinct response patterns emerged: adversarial capitulation, default saturation, and a default-direction asymmetry. In the clearest cases, a biased feed could sway an uncertain decision from 5% to 100% certainty, but couldn't shift decisions the model was already confident about.
Why This Matters
This study's findings have significant implications. The real test is always the edge cases, and this research highlights a key one. The effect of feed curation follows a dose-response curve and holds up across various decision domains, including security-critical choices like deployment approvals and access control relaxation.
Here's where it gets practical. Two simple feed-level defenses can mitigate some of these effects, but a frontier model tends to stick to its default. So, should we be directing more resources to audit the feed layer instead of focusing solely on the final prompt? Absolutely. The deployment story gets messier the more we understand these nuances.
The Broader Picture
As we continue to integrate LLMs into our tools and everyday systems, understanding these interactions becomes vital. It's not just about building smarter models, but also about ensuring the information they consume doesn't lead them astray. In practice, models and their input streams need to be evaluated as a whole. Ignoring the power of feed curation means missing a key piece of the puzzle.
So, the next time we consider the efficacy of an LLM, let's ask ourselves: Are we evaluating the complete system or just the final output? In production, this oversight could mean the difference between a model that serves us well and one that's easily misled.
Get AI news in your inbox
Daily digest of what matters in AI.