Mathematical relationships showing how AI model performance improves predictably with more data, compute, and parameters.
Mathematical relationships showing how AI model performance improves predictably with more data, compute, and parameters. Discovered by researchers at OpenAI, these laws help plan training runs by predicting final performance from smaller experiments. The intellectual foundation for the 'bigger is better' approach.
Scaling laws are mathematical relationships that predict how model performance improves as you increase compute, data, and parameters. The key finding: performance improves smoothly and predictably as a power law of these factors. Double the compute, and you get a predictable improvement. This lets researchers plan training runs worth millions of dollars with reasonable confidence.
The landmark papers came from OpenAI (2020) and DeepMind's Chinchilla (2022). OpenAI showed that loss decreases as a power law with model size, dataset size, and compute budget. Chinchilla refined this by showing that many models were over-parameterized for their data — a 70B model trained on more data beat a 280B model trained on less data. This reshaped the industry: everyone started training smaller models on more data.
Scaling laws have been both a blessing and a constraint. They gave the field a roadmap — if you want X% better performance, you need Y more compute, which costs Z dollars. This predictability attracted enormous investment. But they also suggest that achieving major capability jumps requires exponentially more resources. The question everyone's asking: do scaling laws continue to hold, or will we hit diminishing returns? And can algorithmic improvements change the curve itself?
"Based on scaling laws, we estimated that a 3x compute increase would reduce our model's error rate by about 15% — and the actual improvement was within 2% of our prediction."
The initial, expensive phase of training where a model learns general patterns from a massive dataset.
The processing power needed to train and run AI models.
A research paper from DeepMind that proved most large language models were over-sized and under-trained.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.