A parameter that controls the randomness of a language model's output. Low temperature (near 0) makes the model pick the most likely tokens, producing focused and deterministic text. High temperature (above 1) makes outputs more random and creative. A key tool for controlling generation behavior.
Temperature is a parameter that controls how random or creative a language model's outputs are. At temperature 0, the model always picks the most likely next token — producing deterministic, repetitive, but reliable output. At higher temperatures (0.7-1.0), the model samples more broadly, producing varied and creative but less predictable responses. Above 1.0, things get chaotic.
Technically, temperature scales the logits (raw prediction scores) before the softmax function converts them to probabilities. Low temperature sharpens the probability distribution — the most likely token gets an even higher probability. High temperature flattens it — less likely tokens get more of a chance. It's a simple knob that has a huge effect on output character.
Choosing the right temperature depends on your use case. Coding, factual Q&A, and structured data extraction work best at low temperatures (0-0.3) where you want consistent, correct outputs. Creative writing, brainstorming, and generating diverse options benefit from higher temperatures (0.7-1.0). Most API defaults sit around 0.7 as a reasonable middle ground. Temperature interacts with other sampling parameters like top-p and top-k, so tuning them together gives you fine control over output behavior.
"Set temperature to 0 for our data extraction pipeline — we need consistent, reproducible outputs. For the brainstorming tool, we crank it to 0.9."
The process of selecting the next token from the model's predicted probability distribution during text generation.
A text generation method (also called nucleus sampling) that only considers tokens whose cumulative probability exceeds a threshold P.
A function that converts a vector of numbers into a probability distribution — all values between 0 and 1 that sum to 1.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.