Decoding the Curriculum: How Large Language Models Learn

Large language models (LLMs) are like black boxes, performing complex tasks with ease while leaving researchers puzzled about their learning processes. A recent study offers a fresh perspective, suggesting that these models follow a compositional and predictable curriculum during pretraining.

The Implicit Curriculum Hypothesis

The paper introduces the Implicit Curriculum Hypothesis, proposing that LLMs acquire skills in a structured order. Instead of relying solely on scaling laws to gauge improvement, the hypothesis suggests that skills emerge predictably, often following a compositional path. This is a bold claim, challenging the notion that LLM learning is an inscrutable process.

Tracking the Learning Path

To test this hypothesis, researchers designed a suite of tasks covering retrieval, morphological transformations, coreference, logical reasoning, and mathematics. By tracking when models of varying sizes (410M-13B parameters) reached fixed accuracy thresholds, they discovered a striking consistency in the emergence order of skills, boasting a correlation coefficient of 0.81 across 45 model pairs. This consistency suggests that LLMs don't just get smarter with more compute, they get smarter in a particular sequence.

Composite Tasks and Model Representations

The study found that composite tasks, like logical reasoning, typically emerge after their simpler components. This suggests LLMs build complex skills on top of simpler ones. Interestingly, tasks with similar function vector representations also showed similar learning trajectories. It's as if these models have an internal map guiding their learning journeys. The ablation study reveals that these internal representations can predict the training paths of new compositional tasks with an R-squared value between 0.68 and 0.84.

Why It Matters

So, why should we care? Understanding the learning trajectory of LLMs can revolutionize how we design and train these models. If skills emerge in a predictable order, we can optimize training regimens, saving time and resources. This builds on prior work from the space of neural representation and pushes us closer to a transparent AI.

But here's the million-dollar question: Are we ready to embrace the idea that AI learning isn't a chaotic dance but a structured symphony? By demystifying the learning process, we can harness the true potential of LLMs.