An AI model with billions of parameters trained on massive text datasets.
An AI model with billions of parameters trained on massive text datasets. LLMs like GPT-4, Claude, Gemini, and LLaMA can generate text, answer questions, write code, translate languages, and reason about complex problems. The technology behind the current AI boom.
A large language model (LLM) is a neural network trained on massive amounts of text that can understand and generate human language. "Large" refers to both the model size (billions of parameters) and the training data (trillions of tokens from books, websites, code, and more). GPT-4, Claude, Gemini, and LLaMA are all LLMs.
LLMs work by predicting the next token in a sequence. During training, they see trillions of examples of text and adjust their parameters to get better at prediction. This simple objective — guess the next word — turns out to produce models that can write essays, solve math problems, translate languages, and write code. The breadth of capability from such a simple training signal remains one of the most surprising findings in AI.
The "large" part matters because of scaling laws: bigger models trained on more data consistently perform better. But there are practical limits. The largest models require enormous GPU clusters to train and run, which means they're controlled by a handful of well-funded organizations. The open-source movement (LLaMA, Mistral, Qwen) is working to make capable models available to everyone, and techniques like quantization make it possible to run respectable LLMs on consumer hardware.
"We evaluated three LLMs for our customer support bot — Claude for quality, Mistral for cost, and a fine-tuned LLaMA for data privacy since it runs on our servers."
The neural network architecture behind virtually all modern AI language models.
Generative Pre-trained Transformer.
A large AI model trained on broad data that can be adapted for many different tasks.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.