A function that converts a vector of numbers into a probability distribution — all values between 0 and 1 that sum to 1.
A function that converts a vector of numbers into a probability distribution — all values between 0 and 1 that sum to 1. Used as the final layer in classification models and in the attention mechanism of transformers. The 'temperature' parameter in text generation controls how sharp or flat the softmax distribution is.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
A machine learning task where the model assigns input data to predefined categories.
A parameter that controls the randomness of a language model's output.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
An autonomous AI system that can perceive its environment, make decisions, and take actions to achieve goals.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.