A function that converts a vector of numbers into a probability distribution — all values between 0 and 1 that sum to 1.
A function that converts a vector of numbers into a probability distribution — all values between 0 and 1 that sum to 1. Used as the final layer in classification models and in the attention mechanism of transformers. The 'temperature' parameter in text generation controls how sharp or flat the softmax distribution is.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
A machine learning task where the model assigns input data to predefined categories.
A parameter that controls the randomness of a language model's output.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
The research field focused on making sure AI systems do what humans actually want them to do.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.