The number of training examples processed together before the model updates its weights.
The number of training examples processed together before the model updates its weights. Larger batches give more stable gradient estimates but need more memory. Smaller batches add noise that can actually help escape bad local minima. Finding the right batch size is part art, part science.
The process of teaching an AI model by exposing it to data and adjusting its parameters to minimize errors.
The fundamental optimization algorithm used to train neural networks.
One complete pass through the entire training dataset.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.