State of AI Models — 2026
Live data on the AI model landscape. Benchmarks, pricing, and capability trends updated weekly. Use this data freely — just link back to Machine Brief.
Last updated: March 2026 · Interactive comparison tool →
Key Trends
Avg. input cost (per 1M tokens)
$8.26
-62% YoY
Avg. MMLU score (frontier)
89.9
+4.1 pts YoY
Max context window
2M tokens
+15x YoY
Open-source models in top 10
3
+2 YoY
Models with reasoning mode
7/10
New category
Avg. time to new frontier model
~6 weeks
-50% YoY
Frontier Model Benchmarks
| Model | Provider | MMLU | HumanEval | Context | Input/1M | Output/1M | Released |
|---|---|---|---|---|---|---|---|
| GPT-4o | OpenAI | 88.7 | 90.2 | 128K | $2.50 | $10.00 | May 2024 |
| GPT-4.5 | OpenAI | 90.8 | 92 | 128K | $75.00 | $150.00 | Feb 2025 |
| Claude 3.5 Sonnet | Anthropic | 88.7 | 92 | 200K | $3.00 | $15.00 | Jun 2024 |
| Claude 3.5 Opus | Anthropic | 91.2 | 93.1 | 200K | $15.00 | $75.00 | Mar 2025 |
| Gemini 2.0 Pro | 89.5 | 88.4 | 2M | $1.25 | $5.00 | Dec 2024 | |
| Gemini 2.5 Pro | 90.3 | 91.5 | 1M | $1.25 | $10.00 | Mar 2025 | |
| Llama 4 Maverick | Meta | 88.2 | 89.7 | 1M | Free | Free | Feb 2025 |
| DeepSeek R1 | DeepSeek | 90.8 | 97.3 | 128K | $0.55 | $2.19 | Jan 2025 |
| Grok 3 | xAI | 91.5 | 93.8 | 128K | $3.00 | $15.00 | Feb 2025 |
| Mistral Large 2 | Mistral | 84 | 92 | 128K | $2.00 | $6.00 | Jul 2024 |
Prices reflect API list pricing as of March 2026. Benchmarks from official reports and independent evaluations.
Cite This Data
You're free to use this data in articles, presentations, and research. Please cite:
Machine Brief. "State of AI Models 2026." machinebrief.com/data. Accessed March 2026.Methodology
Benchmark scores come from official model cards, peer-reviewed evaluations, and independent testing platforms like LMSYS Chatbot Arena and Artificial Analysis.
Pricing data is sourced directly from provider API documentation. "Free" indicates open-weight models available for self-hosting.
Trend calculations use year-over-year comparisons against March 2025 data points. We track frontier models only (top performers per provider).