Why Child-Directed Speech Leaves Language Models Lost in...

In the fascinating world of language models, a peculiar experiment reveals that these AI creations struggle with a basic human language bias: mutual exclusivity (ME). ME is the tendency to map new words to new objects, and it seems text-only language models miss the mark entirely when trained on child-directed speech.

The Experiment

Researchers embarked on an ambitious project to test this idea. They evaluated how models reacted when familiar objects were given new labels in a two-referent context. Spoiler alert: the models didn't show ME. Instead, they exhibited repetition priming, meaning they stuck to the familiar rather than embracing novelty.

A closer look at one particular model, dubbed BabyBERTa, revealed it was completely oblivious to multi-sentence contexts. Autoregressive models, on the other hand, demonstrated a strong inclination toward repetition, undermining the concept of mutual exclusivity altogether. The test also showed that what seemed like ME behavior was actually a result of embedding similarities, not genuine referential disambiguation.

Deep Dive Into Data

In a large-scale confirmatory experiment, researchers trained 45 GPT-2-like models, with parameter sizes ranging from 2.9 million to 33.5 million. They were put through their paces on a pre-registered ME battery. The results weren't flattering. Repetition priming was prevalent across all scenarios, affecting a staggering 85-100% of items tested, with statistically significant results (all p<2.4 x 10^-13).

Interestingly, as language modeling improved, the repetition bias weakened slightly but never disappeared. Even across a broad perplexity range, it stayed above zero. These findings are clear: distributional learning from child-directed input pushes models toward repetition-based reference tracking, sidestepping the path to lexical exclusivity.

What Does This Mean?

This isn't just a technical hiccup. It points to a fundamental gap in how language models process and understand context, which has broader implications for AI development. If these models can't grasp a basic language bias, what else are they missing? Referential grounding appears to be a necessary pillar for achieving true mutual exclusivity. This suggests that the AI's input structure might need a major overhaul.

But who benefits from this insight? It's a call to action for researchers and developers to rethink how models learn language. They need to factor in grounded cognition and perhaps incorporate more diverse data sources. The benchmark doesn't capture what matters most.

Ultimately, we're left asking: can AI ever truly replicate the nuanced ways humans process language? While the technology continues to advance, this study shows there's still a long road ahead.

Why Child-Directed Speech Leaves Language Models Lost in Translation

The Experiment

Deep Dive Into Data

What Does This Mean?

Key Terms Explained