OpenAI's open-source speech recognition model. Trained on 680,000 hours of multilingual audio, it handles multiple languages, accents, and background noise remarkably well. Available in several sizes for different speed/accuracy trade-offs. Widely adopted for transcription, subtitling, and voice interfaces.
Converting spoken audio into written text.
The AI company behind ChatGPT, GPT-4, DALL-E, and Whisper.
AI models that can understand and generate multiple types of data — text, images, audio, video.
A mathematical function applied to a neuron's output that introduces non-linearity into the network.
An optimization algorithm that combines the best parts of two other methods — AdaGrad and RMSProp.
Artificial General Intelligence.
Browse our complete glossary or subscribe to our newsletter for the latest AI news and insights.