A large AI model trained on broad data that can be adapted for many different tasks.
A large AI model trained on broad data that can be adapted for many different tasks. GPT-4, Claude, and LLaMA are foundation models. The term, coined by Stanford researchers, emphasizes that these models serve as the foundation for a wide range of downstream applications.
A foundation model is a large AI model trained on broad data that can be adapted to a wide range of tasks. GPT-4, Claude, LLaMA, and Gemini are all foundation models. The term was coined by Stanford researchers to capture the idea that these models serve as a "foundation" on which many different applications are built.
The key property is generality. A foundation model isn't trained for one specific purpose — it learns general patterns from massive datasets (internet text, code, books, images) and then gets adapted through fine-tuning, prompting, or other techniques. This is a fundamental shift from the old paradigm where you trained a separate model for every task. Now one model can summarize text, write code, translate languages, and answer questions.
The "foundation" metaphor cuts both ways. These models enable incredible capabilities, but they also concentrate risk. If a foundation model has biases or vulnerabilities, every application built on it inherits them. The training data, design choices, and safety measures of a handful of models now affect millions of downstream applications. This concentration is why the governance and safety of foundation models gets so much attention.
"Rather than training our own model from scratch, we're building on Claude as our foundation model and customizing it for our specific use case through prompting and RAG."
An AI model with billions of parameters trained on massive text datasets.
The initial, expensive phase of training where a model learns general patterns from a massive dataset.
The process of taking a pre-trained model and continuing to train it on a smaller, specific dataset to adapt it for a particular task or domain.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.