Can Transformers Really Infer Rules? The Debate Heats Up
A new study challenges the notion that transformers can only interpolate based on observed data. Surprising results suggest these models might grasp unseen rules.
The debate over large language models (LLMs) and their capabilities rages on. Can transformers genuinely infer rules absent from their training data? Or are they merely interpolating based on what's been observed? A recent study takes a hard look at this.
Experimenting with XOR Logic
Let's break this down. In the first experiment, researchers used a cellular automaton with a pure XOR transition rule. This rule, notably linearly inseparable, presents a challenge. Specific local input patterns were removed from training. Why does this matter? Because each held-out pattern's nearest neighbors should have opposite labels, making similarity-based predictions fail in these regions.
But here's the twist. A simple two-layer transformer managed to recover the XOR rule with remarkable consistency. Out of 60 runs, 47 converged to successful outcomes, achieving a perfect score of 100%. Even more intriguingly, with multi-step constraint propagation, performance soared. Without unrolling, accuracy stayed at 63.1%, but soft unrolling pushed it to 96.7%.
Beyond Final Answers: Emitting Intermediate Steps
The second experiment took a different angle. Researchers tested symbolic operator chains over integers, removing one operator pair to see if the model could still emit intermediate steps and the final answer in a proof-like format. Across all 49 holdout pairs, the transformer outperformed interpolation baselines, with a mean accuracy of 41.8%, peaking at 78.6%. By contrast, kernel ridge regression averaged just 4.3%, and other models like KNN and MLP scored zero across the board.
Here's what the benchmarks actually show: the transformer's ability to emit intermediate steps is key. Removing this supervision led to performance degradation.
Transformers: Rule Learners or Interpolators?
The numbers tell a different story from what many expect. By showing that a standard transformer block can implement exact local Boolean rules, this research provides evidence that transformers can learn rule structures not directly observed in training. But the reality is this: it doesn't fully close the case on when and how such behavior emerges in large-scale language training.
Strip away the marketing and you get a clearer view. Transformers aren't just interpolators. they've the potential to discover and communicate previously unseen rules. But will this change how we train and trust these models? That's the million-dollar question.
Get AI news in your inbox
Daily digest of what matters in AI.