Compare 17 leading AI models on pricing, benchmarks, and capabilities. Select up to 4 models for a detailed side-by-side view.
| Select | Model | Provider | Released | Context | Input $/M | Output $/M | MMLU | HumanEval | MATH | GPQA | Vision | Fn Call | Fine-tune | Open Src |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Claude 3.5 Opus | Anthropic | Feb 2025 | 200K | $15.00 | $75.00 | 91.1 | 93 | 80.4 | 68 | |||||
| Claude 3.5 Sonnet | Anthropic | Jun 2024 | 200K | $3.00 | $15.00 | 88.7 | 92 | 71.1 | 59.4 | |||||
| Claude 4 Opus | Anthropic | Feb 2025 | 200K | $15.00 | $75.00 | 92 | 95.1 | 85.6 | 72.4 | |||||
| Claude 4 Sonnet | Anthropic | Jan 2025 | 200K | $3.00 | $15.00 | 90.5 | 94.2 | 82.8 | 65.5 | |||||
| DeepSeek R1 | DeepSeek | Jan 2025 | 128K | $0.55 | $2.19 | 90.8 | 85.5 | 97.3 | 71.5 | |||||
| DeepSeek V3 | DeepSeek | Dec 2024 | 128K | $0.27 | $1.10 | 88.5 | 82.6 | 90.2 | 59.1 | |||||
| Gemini 2.0 Flash | Jan 2025 | 1M | $0.10 | $0.40 | 85.2 | 84 | 73.1 | 51.2 | ||||||
| Gemini 2.0 Pro | Feb 2025 | 2M | $1.25 | $5.00 | 89.5 | 88 | 80.2 | 58.7 | ||||||
| Gemini 2.5 Pro | Feb 2025 | 1M | $1.25 | $10.00 | 91.5 | 92.8 | 89.4 | 71.2 | ||||||
| GPT-4.5 | OpenAI | Feb 2025 | 128K | $75.00 | $150.00 | 90.8 | 88.6 | 78.2 | 65 | |||||
| GPT-4o | OpenAI | May 2024 | 128K | $2.50 | $10.00 | 88.7 | 90.2 | 76.6 | 53.6 | |||||
| GPT-o1 | OpenAI | Dec 2024 | 200K | $15.00 | $60.00 | 91.8 | 92.4 | 96.4 | 78 | |||||
| GPT-o3 | OpenAI | Jan 2025 | 200K | $10.00 | $40.00 | 92.3 | 93.8 | 97.8 | 83.3 | |||||
| Grok 2 | xAI | Aug 2024 | 128K | $2.00 | $10.00 | 87.5 | 88.4 | 76.1 | 56 | |||||
| Llama 3.3 70B | Meta | Dec 2024 | 128K | $0.18 | $0.18 | 86 | 88.4 | 77 | 50.7 | |||||
| Llama 4 405B | Meta | Feb 2025 | 128K | $0.80 | $0.80 | 90.2 | 91.6 | 82.8 | 61.3 | |||||
| Mistral Large | Mistral | Nov 2024 | 128K | $2.00 | $6.00 | 84 | 92 | 69.1 | 52.3 |
Deep research and complex analysis. Anthropic's most capable 3.x series model.
Coding and analysis. Fast, capable, and cost-effective for most production workloads.
Frontier reasoning and creative work. Top-tier for nuanced writing and hard problems.
Best coding model on the market. Excellent at agentic tasks and tool use.
Reasoning on a budget. Matches o1 on math benchmarks at a fraction of the cost.
Incredible value for coding. MoE architecture delivers top-tier math at rock-bottom prices.
Cheapest capable model. Insanely fast with a massive context window for bulk processing.
Long-context reasoning. 2M token window makes it unmatched for massive document analysis.
Google's thinking model. Competitive with o3 on reasoning benchmarks at lower cost.
Creative writing and nuanced reasoning. Biggest unsupervised pre-training model from OpenAI.
Fast multimodal tasks. Great balance of speed, cost, and quality for everyday use.
Complex reasoning and math. Chain-of-thought reasoning model that excels at hard problems.
State-of-the-art reasoning. Currently OpenAI's most capable model on hard benchmarks.
Real-time data access via X platform. Less filtered than most competitors.
Best open-source value. Punches way above its weight for a 70B parameter model.
Most capable open model. Competitive with frontier closed-source models across the board.
European alternative with strong multilingual skills. Solid for code and function calling.
In-depth articles comparing the biggest names in AI — with real opinions, not just spec sheets.
An honest comparison of ChatGPT and Claude across reasoning, coding, creative writing, and more. We tested both extensively — here's what we found.
GPT-4o and Gemini 2.0 Pro are the leading multimodal AI models. We compare them on vision, reasoning, speed, and real-world performance.
Claude 4 Opus and GPT-o3 are the most powerful reasoning models available. We compare them on hard problems, coding, analysis, and real-world performance.
Meta's Llama 4 and DeepSeek R1 are the two most capable open-source AI models. We compare them on performance, efficiency, and what they mean for the open-source AI movement.
Midjourney and DALL-E 3 are the top AI image generators. We compare them on image quality, style, prompting, pricing, and ease of use.
GitHub Copilot and Cursor are the two leading AI coding assistants. We compare features, model quality, pricing, and real-world coding experience.
Perplexity AI is challenging Google as the go-to search engine. We compare them on accuracy, speed, depth, and whether AI search is actually ready to replace traditional search.
OpenAI's Sora and Runway Gen-3 Alpha are the leading AI video generators. We compare them on quality, control, speed, pricing, and real-world usability.
Mistral Large and xAI's Grok 2 are two ambitious challengers to the AI establishment. We compare them on capabilities, performance, personality, and value.
ChatGPT Plus and Claude Pro both cost $20/month. We compare exactly what you get with each subscription — features, limits, model access, and which is the better value.
Benchmark data sourced from official reports and independent evaluations. Pricing reflects API rates as of Feb 2025.
Missing something? Let us know