Language Models Show Surprising Categorical Perception...

Categorical perception, a phenomenon in perceptual psychology where discrimination sharpens at category boundaries, appears to have an unexpected counterpart large language models (LLMs). Recent research uncovers that when these models process Arabic numerals, their hidden-state representations undergo geometric warping akin to categorical perception.

Geometric Warping in LLMs

Through a comprehensive study involving six models across five architecture families, researchers employed representational similarity analysis to probe this phenomenon. The findings indicate that a CP-additive model, which factors in both log-distance and a boundary boost, more accurately fits the representational geometry compared to a continuous model across all primary layers tested. Notably, this warping is specific to structurally defined boundaries such as digit-count transitions at 10 and 100. These effects don't manifest at non-boundary control positions or linguistic categories like hot and cold, where there's no tokenization discontinuity.

Two Distinct Patterns Emerge

The study identified two distinct patterns of categorical perception in LLMs: 'classic CP' and 'structural CP'. In models like Gemma and Qwen, which display classic CP, both categorization and geometric warping are evident. However, Llama, Mistral, and Phi exhibit structural CP, where geometry shifts at the boundary, but they don't explicitly show category distinctions. This dissociation is consistent across boundaries and appears to be an inherent architectural feature rather than a response to the stimulus.

Why It Matters

This insight into categorical perception in LLMs isn't just academic. It raises important questions about how these models process structured data. Are we inadvertently hardcoding biases and perceptual shortcuts into our AI systems? Understanding these nuances is essential as we increasingly rely on AI for decision-making. The AI-AI Venn diagram is getting thicker, and this intersection forces us to reconsider the autonomy we grant these systems.

Structural input-format discontinuities, it seems, suffice to produce categorical perception geometry within these models, independent of explicit semantic category knowledge. This isn't just a quirk of machine perception. it's a glimpse into how 'thinking machines' discern structure. If agents have wallets, who holds the keys to their cognitive processes?

Language Models Show Surprising Categorical Perception During Number Processing

Geometric Warping in LLMs

Two Distinct Patterns Emerge

Why It Matters

Key Terms Explained