A model's ability to learn new tasks simply from examples provided in the prompt, without any weight updates.
A model's ability to learn new tasks simply from examples provided in the prompt, without any weight updates. You show the model a few examples of input-output pairs, and it figures out the pattern. One of the most surprising emergent abilities of large language models.
In-context learning is a model's ability to learn new tasks from examples provided directly in the prompt, without any weight updates or retraining. You show the model a few examples of inputs and desired outputs, and it figures out the pattern and applies it to new inputs. This was one of the most surprising discoveries about large language models.
What makes this remarkable is that the model isn't being trained when this happens. Its parameters don't change. It's using the examples in the context window as a kind of temporary "program" that guides its behavior for that specific conversation. This is fundamentally different from fine-tuning, where the model's weights actually get modified. In-context learning is transient — start a new conversation without the examples and the model reverts to its default behavior.
Researchers still debate exactly how in-context learning works internally. One theory is that transformer layers implement something like gradient descent steps implicitly, effectively "learning" within a forward pass. Whatever the mechanism, the practical result is that large language models are incredibly flexible — you can steer them toward almost any task format just by showing examples. This is why prompt engineering is so powerful.
"We use in-context learning by including five example email classifications in every prompt, so the model follows our specific labeling scheme without any fine-tuning."
The ability of a model to learn a new task from just a handful of examples, often provided in the prompt itself.
The text input you give to an AI model to direct its behavior.
Capabilities that appear in AI models at scale without being explicitly trained for.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.