Are Language Models Getting a Little Too Agreeable?
Research reveals that large language models are swayed by prior conversation tone, impacting impartiality. The study highlights how negativity in dialogue history amplifies bias.
Large language models (LLMs) are becoming ubiquitous. From code reviews to content moderation, they're handling tasks that require a semblance of judgment. But, are they as impartial as we think? Recent findings suggest otherwise, revealing a tendency for these models to be influenced by the tone of prior conversation. whether LLMs are too impressionable and why that might be a concern.
The Accumulated Message Effect
Dubbed the accumulated message effect on LLM judgments (AMEL), this phenomenon was identified in a comprehensive study involving 75,898 API calls across 11 models from major providers like OpenAI, Anthropic, and Google. The study discovered that when test items were presented to models following histories filled with predominantly positive or negative evaluations, the models tended to sway towards the prevailing tone. Specifically, they leaned more towards the conversation's polarity, with a notable effect size of -0.17 (p<10^-46).
This bias becomes particularly evident when models are uncertain. For high-entropy items, where the baseline is genuinely uncertain, the shift deepens to -0.34. Surprisingly, the bias doesn't amplify with longer context. Whether there are 5 prior turns or 50, the shift remains consistent.
Negativity Bias and Model Scaling
The study also uncovered a negativity asymmetry. In paired comparisons, negative histories induced 1.62 times more bias than positive ones. This imbalance suggests that negativity holds more sway over the models' judgments. So, even as LLMs scale, Anthropic's Haiku to Opus or OpenAI's Nano to GPT-5.2, the bias only slightly diminishes, never fully disappearing.
Why should this matter to us? If our AI systems are swayed by past negativity, it raises questions about their ability to remain impartial. Are they, in effect, just amplifying the echo chambers they're meant to moderate?
Practical Solutions
The research sheds light on potential solutions. The bias originates from continuous shifts in token probability distributions, not sudden changes. A practical fix for this is ensuring a fresh context for each evaluation. When batching isn't an option, balancing the conversation history becomes essential.
We're building the financial plumbing for machines, but what happens when these pipes get clogged with bias? The AI-AI Venn diagram is getting thicker, and the nuances of these interactions are key for the future of automated decision-making.
If agents have wallets, who holds the keys to their judgment? Ensuring that LLMs remain as unbiased and objective as possible isn't just an academic exercise. it's key for maintaining trust in AI-driven processes across industries.
Get AI news in your inbox
Daily digest of what matters in AI.