Running a trained model to make predictions on new data. Distinct from training, which is where the model learns. Inference needs to be fast and cheap for real-world deployment. Companies spend enormous sums on inference infrastructure — it's often more expensive than training over a model's lifetime.
Inference is when a trained AI model actually processes input and produces output — the moment it does useful work. Training is the expensive, time-consuming phase where the model learns. Inference is the fast, repeated phase where it applies what it learned. Every time you send a message to ChatGPT or Claude and get a response, that's inference.
The economics of AI largely revolve around inference costs. Training GPT-4 reportedly cost over $100 million, but that's a one-time expense. Inference happens billions of times — every API call, every chatbot response, every auto-complete suggestion. At scale, inference costs dwarf training costs. This is why there's so much research into making inference faster and cheaper: quantization, distillation, specialized hardware (like Groq's LPU), and smaller models that run on phones.
Inference speed matters for user experience too. A 30-second response feels broken; a sub-second response feels magic. Techniques like speculative decoding (using a small fast model to draft tokens that a larger model verifies) and KV-caching (reusing computation from previous tokens) help make responses snappy. The push to run models on-device — on phones and laptops — is fundamentally about making inference fast and free of server round-trips.
"Training the model took three weeks on a GPU cluster, but inference is fast enough to serve 10,000 API requests per minute on a single server."
The process of teaching an AI model by exposing it to data and adjusting its parameters to minimize errors.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
An autonomous AI system that can perceive its environment, make decisions, and take actions to achieve goals.
The research field focused on making sure AI systems do what humans actually want them to do.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.