The ability of a model to learn a new task from just a handful of examples, often provided in the prompt itself.
The ability of a model to learn a new task from just a handful of examples, often provided in the prompt itself. Large language models excel at this — show them 2-3 examples of a task and they can generalize. Contrasts with traditional ML that needs thousands of labeled examples.
Few-shot learning is when a model learns to perform a task from just a handful of examples, rather than needing thousands or millions. In practice, this usually means including 2-5 examples in your prompt to show the model what you want. "Here's how I want you to format this data: [example 1], [example 2]. Now do the same for this input."
This capability is one of the most practically useful features of large language models. Before LLMs, building a classifier or text converter meant collecting a labeled dataset, training a model, tuning hyperparameters, and deploying it. Now you can get 80% of the way there by showing a few examples in a prompt. It's not as robust as a dedicated model, but the speed of iteration is transformative.
Few-shot learning sits on a spectrum with zero-shot (no examples, just instructions) and many-shot (lots of examples). The sweet spot depends on the task complexity. For simple formatting tasks, zero-shot often works fine. For nuanced judgment calls — like classifying customer complaints into specific categories your business uses — a few well-chosen examples make a huge difference. The key is picking representative examples that cover the edge cases.
"Instead of training a custom model, we used few-shot learning — gave Claude three examples of our desired output format and it handled the rest."
A model's ability to learn new tasks simply from examples provided in the prompt, without any weight updates.
A model's ability to perform a task it was never explicitly trained on, with no examples provided.
The text input you give to an AI model to direct its behavior.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.