A value the model learns during training — specifically, the weights and biases in neural network layers.
A value the model learns during training — specifically, the weights and biases in neural network layers. When we say GPT-4 has hundreds of billions of parameters, we mean that many individual numbers were learned. More parameters generally means more capacity to learn complex patterns, but also more compute needed.
Parameters are the internal numerical values that a neural network adjusts during training to learn patterns. Think of them as the model's "knowledge" encoded as numbers. When people say GPT-4 has trillions of parameters or LLaMA has 70 billion, they're talking about how many adjustable values the model contains.
Each parameter is a single number — a weight on a connection between neurons or a bias term. During training, the model processes examples and adjusts these numbers to minimize errors. After training, the parameters are fixed and determine how the model responds to any input. More parameters generally means more capacity to learn complex patterns, which is why the AI industry has been racing to build larger models.
But parameter count alone doesn't determine model quality. Training data quality, architecture choices, and training techniques all matter enormously. A well-trained 7B parameter model (like Mistral 7B) can outperform a poorly trained 13B model. And some researchers argue we've been training models that are too large for their data — the Chinchilla scaling laws suggest that many models should have been smaller but trained on more data. Parameter count is one dimension of capability, not the whole story.
"LLaMA 3 comes in 8B and 70B parameter versions — the larger one is smarter but needs significantly more GPU memory to run."
A numerical value in a neural network that determines the strength of the connection between neurons.
The process of teaching an AI model by exposing it to data and adjusting its parameters to minimize errors.
A setting you choose before training begins, as opposed to parameters the model learns during training.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.