What does this AI glossary cover?

Machine Brief's AI glossary covers 175+ terms spanning machine learning, deep learning, natural language processing, computer vision, generative AI, and AI safety.

Is this glossary free?

Yes, Machine Brief's AI glossary is 100% free to use. No account or signup required.

Who is this glossary for?

Anyone who wants to understand AI terminology — from complete beginners to engineers switching into AI.

What concepts are related to Inference?

Key concepts related to Inference include: Training, Activation Function, Adam Optimizer, AGI, AI Agent, AI Alignment. Understanding these related terms helps build a deeper knowledge of ai and how Inference fits into the broader ecosystem.

Inference - AI Glossary

Definition

Running a trained model to make predictions on new data. Distinct from training, which is where the model learns. Inference needs to be fast and cheap for real-world deployment. Companies spend enormous sums on inference infrastructure — it's often more expensive than training over a model's lifetime.

How It Works

Inference is when a trained AI model actually processes input and produces output — the moment it does useful work. Training is the expensive, time-consuming phase where the model learns. Inference is the fast, repeated phase where it applies what it learned. Every time you send a message to ChatGPT or Claude and get a response, that's inference.

The economics of AI largely revolve around inference costs. Training GPT-4 reportedly cost over $100 million, but that's a one-time expense. Inference happens billions of times — every API call, every chatbot response, every auto-complete suggestion. At scale, inference costs dwarf training costs. This is why there's so much research into making inference faster and cheaper: quantization, distillation, specialized hardware (like Groq's LPU), and smaller models that run on phones.

Inference speed matters for user experience too. A 30-second response feels broken; a sub-second response feels magic. Techniques like speculative decoding (using a small fast model to draft tokens that a larger model verifies) and KV-caching (reusing computation from previous tokens) help make responses snappy. The push to run models on-device — on phones and laptops — is fundamentally about making inference fast and free of server round-trips.

Inference

Definition

How It Works

Example Usage

Share this term

Learn More About Inference

Related Terms

Training

Activation Function

Adam Optimizer

AGI

AI Agent

AI Alignment

Explore More

Want to learn more about AI?

Inference

Definition

How It Works

Example Usage

Share this term

Learn More About Inference

Related Terms

Training

Activation Function

Adam Optimizer

AGI

AI Agent

AI Alignment

Explore More

Want to learn more about AI?