A model's ability to perform a task it was never explicitly trained on, with no examples provided.
A model's ability to perform a task it was never explicitly trained on, with no examples provided. Just describe the task and the model handles it. Large language models are remarkable zero-shot learners — ask Claude to translate to Finnish without showing Finnish examples and it just works.
Zero-shot learning is a model's ability to perform a task it has never been explicitly trained on, without any examples. You just describe what you want, and the model figures it out. "Classify this email as urgent or not urgent" — without showing any examples of urgent emails. The model generalizes from its broad training to handle novel tasks.
This capability is one of the most remarkable properties of large language models. Smaller models needed specific training data for every task. Large models, through the sheer breadth of their training, develop the ability to follow arbitrary instructions. GPT-3's zero-shot abilities surprised even its creators — the model could translate, summarize, and answer questions about topics just from written instructions.
Zero-shot performance typically improves with model scale. A 7B parameter model might struggle with zero-shot classification that a 70B model handles easily. When zero-shot doesn't work well enough, you move to few-shot (adding examples) or fine-tuning (actual training). The beauty of zero-shot is speed — you can prototype and test ideas in seconds. For many business applications, zero-shot with a capable model gets you 80% of the way there, and you only invest in fine-tuning if you need that last 20%.
"We tested zero-shot classification first — just asked the model to categorize support tickets with no examples. It worked well enough for 80% of cases, so we skipped fine-tuning."
The ability of a model to learn a new task from just a handful of examples, often provided in the prompt itself.
A model's ability to learn new tasks simply from examples provided in the prompt, without any weight updates.
AI systems that create new content — text, images, audio, video, or code — rather than just analyzing or classifying existing data.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.