Information added to token embeddings to tell a transformer the order of elements in a sequence.
Information added to token embeddings to tell a transformer the order of elements in a sequence. Since attention treats all positions equally, without positional encoding the model couldn't distinguish 'dog bites man' from 'man bites dog.' Various methods exist: sinusoidal, learned, rotary (RoPE), and ALiBi.
The neural network architecture behind virtually all modern AI language models.
A dense numerical representation of data (words, images, etc.
Rotary Position Embedding.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.