Your guide to understanding AI and machine learning terminology. From transformers and attention to RLHF and fine-tuning — every term explained in plain language.
183 terms found
Agent-to-Agent (A2A) is a protocol developed by Google that allows AI agents from different vendors to communicate and collaborate with each other.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Agentic AI refers to AI systems that can autonomously plan, execute multi-step tasks, use tools, and make decisions with minimal human oversight.
Artificial General Intelligence.
An autonomous AI system that can perceive its environment, make decisions, and take actions to achieve goals.
The research field focused on making sure AI systems do what humans actually want them to do.
The broad field studying how to build AI systems that are safe, reliable, and beneficial.
An AI safety company founded in 2021 by former OpenAI researchers, including Dario and Daniela Amodei.
The science of creating machines that can perform tasks requiring human-like intelligence — reasoning, learning, perception, language understanding, and decision-making.
Artificial Superintelligence.
A mechanism that lets neural networks focus on the most relevant parts of their input when producing output.
The attention mechanism is a technique that lets neural networks focus on the most relevant parts of their input when producing output.
A neural network trained to compress input data into a smaller representation and then reconstruct it.
AI systems capable of operating independently for extended periods without human intervention.
A model that generates output one piece at a time, with each new piece depending on all the previous ones.
The algorithm that makes neural network training possible.
A technique that normalizes the inputs to each layer in a neural network, making training faster and more stable.
The number of training examples processed together before the model updates its weights.
A decoding strategy that keeps track of multiple candidate sequences at each step instead of just picking the single best option.
A standardized test used to measure and compare AI model performance.
Bidirectional Encoder Representations from Transformers.
In AI, bias has two meanings.
Byte Pair Encoding.
When a neural network trained on new data suddenly loses its ability to perform well on previously learned tasks.
A prompting technique where you ask an AI model to show its reasoning step by step before giving a final answer.
An AI system designed to have conversations with humans through text or voice.
A research paper from DeepMind that proved most large language models were over-sized and under-trained.
A machine learning task where the model assigns input data to predefined categories.
Anthropic's family of AI assistants, including Claude Haiku, Sonnet, and Opus.
Contrastive Language-Image Pre-training.
Convolutional Neural Network.
The processing power needed to train and run AI models.
The field of AI focused on enabling machines to interpret and understand visual information from images and video.
An approach developed by Anthropic where an AI system is trained to follow a set of principles (a 'constitution') rather than relying solely on human feedback for every decision.
The maximum amount of text a language model can process at once, measured in tokens.
A self-supervised learning approach where the model learns by comparing similar and dissimilar pairs of examples.
AI systems designed for natural, multi-turn dialogue with humans.
An attention mechanism where one sequence attends to a different sequence.
NVIDIA's parallel computing platform that lets developers use GPUs for general-purpose computing.
OpenAI's text-to-image generation model.
Techniques for artificially expanding training datasets by creating modified versions of existing data.
Deliberately corrupting training data to manipulate a model's behavior.
The part of a neural network that generates output from an internal representation.
A subset of machine learning that uses neural networks with many layers (hence 'deep') to learn complex patterns from large amounts of data.
AI-generated media that realistically depicts a person saying or doing something they never actually did.
A leading AI research lab, now part of Google.
A generative AI model that creates data by learning to reverse a gradual noising process.
A technique where a smaller 'student' model learns to mimic a larger 'teacher' model.
Direct Preference Optimization.
A regularization technique that randomly deactivates a percentage of neurons during training.
Running AI models directly on local devices (phones, laptops, IoT devices) instead of in the cloud.
A dense numerical representation of data (words, images, etc.
Capabilities that appear suddenly as language models reach certain sizes.
Capabilities that appear in AI models at scale without being explicitly trained for.
The part of a neural network that processes input data into an internal representation.
A neural network architecture with two parts: an encoder that processes the input into a representation, and a decoder that generates the output from that representation.
One complete pass through the entire training dataset.
The practice of developing AI systems that are fair, transparent, accountable, and respect human rights.
The process of measuring how well an AI model performs on its intended task.
The ability to understand and explain why an AI model made a particular decision.
The process of identifying and pulling out the most important characteristics from raw data.
A training approach where the model learns from data spread across many devices without that data ever leaving those devices.
The ability of a model to learn a new task from just a handful of examples, often provided in the prompt itself.
The process of taking a pre-trained model and continuing to train it on a smaller, specific dataset to adapt it for a particular task or domain.
An optimized attention algorithm that's mathematically equivalent to standard attention but runs much faster and uses less GPU memory.
A large AI model trained on broad data that can be adapted for many different tasks.
A capability that lets language models interact with external tools and APIs by generating structured function calls.
Generative Adversarial Network.
Gaussian Error Linear Unit.
Google's flagship multimodal AI model family, developed by Google DeepMind.
AI systems that create new content — text, images, audio, video, or code — rather than just analyzing or classifying existing data.
Generative Pre-trained Transformer.
Graphics Processing Unit.
A technique that simulates larger batch sizes by accumulating gradients over multiple forward passes before updating weights.
The fundamental optimization algorithm used to train neural networks.
Connecting an AI model's outputs to verified, factual information sources.
Safety measures built into AI systems to prevent harmful, inappropriate, or off-topic outputs.
When an AI model generates confident-sounding but factually incorrect or completely fabricated information.
Methods for identifying when an AI model generates false or unsupported claims.
The leading platform for sharing and collaborating on AI models, datasets, and applications.
A setting you choose before training begins, as opposed to parameters the model learns during training.
The task of assigning a label to an image from a set of predefined categories.
A massive image dataset containing over 14 million labeled images across 20,000+ categories.
A model's ability to learn new tasks simply from examples provided in the prompt, without any weight updates.
Running a trained model to make predictions on new data.
Fine-tuning a language model on datasets of instructions paired with appropriate responses.
An AI model that understands and generates human language.
An AI model with billions of parameters trained on massive text datasets.
The compressed, internal representation space where a model encodes data.
A technique that normalizes activations across the features of each training example, rather than across the batch.
A hyperparameter that controls how much the model's weights change in response to each update.
Meta's family of open-weight large language models.
Large Language Model.
Low-Rank Adaptation.
A mathematical function that measures how far the model's predictions are from the correct answers.
Long Short-Term Memory.
A branch of AI where systems learn patterns from data instead of following explicitly programmed rules.
A pre-training technique where random words in text are hidden (masked) and the model learns to predict them from context.
Model Context Protocol (MCP) is an open standard created by Anthropic that lets AI models connect to external tools, data sources, and APIs through a unified interface.
Training models that learn how to learn — after training on many tasks, they can quickly adapt to new tasks with very little data.
A popular AI image generation service known for its distinctive artistic style.
A French AI company that builds efficient, high-performance language models.
An architecture where multiple specialized sub-networks (experts) share a model, but only a few activate for each input.
Massive Multitask Language Understanding.
A degradation that happens when AI models are trained on data generated by other AI models.
An extension of the attention mechanism that runs multiple attention operations in parallel, each with different learned projections.
AI models that can understand and generate multiple types of data — text, images, audio, video.
AI systems designed for a specific task, as opposed to general intelligence.
The field of AI focused on enabling computers to understand, interpret, and generate human language.
A computing system loosely inspired by biological brains, consisting of interconnected nodes (neurons) organized in layers.
The fundamental task that language models are trained on: given a sequence of tokens, predict what comes next.
Natural Language Processing.
The dominant provider of AI hardware.
A computer vision task that identifies and locates objects within an image, drawing bounding boxes around each one.
AI models whose weights, code, and sometimes training data are publicly released for anyone to use, modify, and build upon.
The AI company behind ChatGPT, GPT-4, DALL-E, and Whisper.
The process of finding the best set of model parameters by minimizing a loss function.
When a model memorizes the training data so well that it performs poorly on new, unseen data.
A value the model learns during training — specifically, the weights and biases in neural network layers.
A measurement of how well a language model predicts text.
Information added to token embeddings to tell a transformer the order of elements in a sequence.
The initial, expensive phase of training where a model learns general patterns from a massive dataset.
The art and science of crafting inputs to AI models to get the best possible outputs.
The text input you give to an AI model to direct its behavior.
The most popular deep learning framework, developed by Meta.
Retrieval-Augmented Generation.
The ability of AI models to draw conclusions, solve problems logically, and work through multi-step challenges.
Reasoning models are AI systems specifically designed to "think" through problems step-by-step before giving an answer.
A neural network architecture where connections form loops, letting the network maintain a form of memory across sequences.
Systematically testing an AI system by trying to make it produce harmful, biased, or incorrect outputs.
A machine learning task where the model predicts a continuous numerical value.
Techniques that prevent a model from overfitting by adding constraints during training.
A learning approach where an agent learns by interacting with an environment and receiving rewards or penalties.
Rectified Linear Unit.
The idea that useful AI comes from learning good internal representations of data.
The practice of developing and deploying AI systems with careful attention to fairness, transparency, safety, privacy, and social impact.
A model trained to predict how helpful, harmless, and honest a response is, based on human preferences.
Reinforcement Learning from Human Feedback.
Recurrent Neural Network.
Rotary Position Embedding.
The process of selecting the next token from the model's predicted probability distribution during text generation.
Mathematical relationships showing how AI model performance improves predictably with more data, compute, and parameters.
An attention mechanism where a sequence attends to itself — each element looks at all other elements to understand relationships.
A training approach where the model creates its own labels from the data itself.
Search that understands meaning and intent rather than just matching keywords.
Automatically determining whether a piece of text expresses positive, negative, or neutral sentiment.
A function that converts a vector of numbers into a probability distribution — all values between 0 and 1 that sum to 1.
Converting spoken audio into written text.
An open-source image generation model released by Stability AI.
Getting a language model to generate output in a specific format like JSON, XML, or a database schema.
The most common machine learning approach: training a model on labeled data where each example comes with the correct answer.
Artificially generated data used for training AI models.
Instructions given to an AI model that define its role, personality, constraints, and behavior rules.
A parameter that controls the randomness of a language model's output.
Google's open-source deep learning framework.
AI models that generate images from text descriptions.
AI systems that convert written text into natural-sounding spoken audio.
The basic unit of text that language models work with.
The component that converts raw text into tokens that a language model can process.
The ability of AI models to interact with external tools and systems — browsing the web, running code, querying APIs, reading files.
A text generation method (also called nucleus sampling) that only considers tokens whose cumulative probability exceeds a threshold P.
Tensor Processing Unit.
The process of teaching an AI model by exposing it to data and adjusting its parameters to minimize errors.
Using knowledge learned from one task to improve performance on a different but related task.
The neural network architecture behind virtually all modern AI language models.
A test proposed by Alan Turing in 1950: if a human can't reliably tell whether they're talking to a machine or another human, the machine passes.
Variational Autoencoder.
A database optimized for storing and searching high-dimensional vectors (embeddings).
A transformer architecture adapted for image processing.
Using AI to create a synthetic copy of someone's voice from a small sample of their speech.
A numerical value in a neural network that determines the strength of the connection between neurons.
OpenAI's open-source speech recognition model.
One of the earliest successful word embedding models, from Google in 2013.
An AI system's internal representation of how the world works — understanding physics, cause and effect, and spatial relationships.