Unlocking Hidden Language Patterns with Trajectory Extrapolation
A new metric, trajectory extrapolation error, sheds light on language processing. It complements surprisal by considering the path of word interpretation.
Human language isn't just about words. it's about how those words unfold in a sequence. Traditional models focus on surprisal, the negative log probability of a word given its context. But there's more to the story. A new metric, trajectory extrapolation error, captures the dynamic nature of language comprehension. This approach doesn't just check where the narrative is. It examines how it's.
Beyond Surprisal: Trajectory Extrapolation
Surprisal has long reigned as the dominant predictor of processing cost. It's simple: higher surprisal means more cognitive load. However, by reducing complex sequences to a single scalar, essential information about the evolving interpretation is lost. This is where trajectory extrapolation error steps in. It measures deviation from an extrapolated path using a linear trajectory fitted to the preceding hidden states of a transformer model. In other words, it tracks the trajectory of comprehension, not just the position.
On the Natural Stories corpus, trajectory extrapolation error is nearly orthogonal to surprisal (r =.044). This independence suggests it's capturing something new. It also independently predicts self-paced reading times, with effects more pronounced in complex garden-path sentences. The implications? Language processing isn't just about guessing the next word. It's about understanding how the entire sentence is unfolding in real-time.
Model Scale and Architecture Matter
Interestingly, the impact of trajectory extrapolation error strengthens with model scale. From GPT-2 Small to Large, the predictions improve, showing that larger models capture more of this dynamic unfolding. The effect also replicates across different architectures, such as GPT-2 and Pythia/RoPE, which use varied positional encoding schemes.
However, it's not just about scale. A displacement control reveals that the effect isn't merely due to representational change magnitude. In fact, displacement and extrapolation error predict in opposite directions. This distinction underscores that trajectory extrapolation error and surprisal are tapping into two distinct components of processing cost.
The Future of Language Models
Why does this matter? Understanding the nuances of language processing can lead to more efficient AI models. If AI can better predict how humans process language, its interactions with us will become more nuanced and effective. But here's the kicker: should we be pushing AI to understand language like humans, or should we accept its unique processing style?
Ultimately, trajectory extrapolation error offers a fresh perspective on language comprehension. It challenges existing models to think beyond word prediction and consider the larger narrative path. Read the source. The docs are lying.
Get AI news in your inbox
Daily digest of what matters in AI.