Orak: The Game-Changing Benchmark for LLMs in Gaming

By Dev PatelApril 17, 2026

Orak offers a comprehensive framework for training and evaluating LLM agents across 12 video games. With a focus on real-world applicability, it addresses current benchmark gaps.

Large Language Models (LLMs) are no longer just text generators. They're reshaping gaming through intelligent character development. But here's the snag. Current benchmarks don't cut it. They overlook diverse LLM capacities across genres and miss the boat on fine-tuning datasets essential for gaming agents.

Introducing Orak

Enter Orak. This new benchmark tackles these challenges head-on. It's built for training and evaluating LLM agents across 12 popular video games, covering all major genres. The real kicker? The Model Context Protocol (MCP) ensures plug-and-play ease, making studies reproducible and systematic.

Why does this matter? Because Orak isn't just about scoring points. It digs deeper, exploring agentic modules in varied game scenarios. This isn't just a theoretical exercise. It's about creating LLM agents that can genuinely compete, and potentially outperform, human players.

Data-Driven Gameplay

Orak tops it off with a fine-tuning dataset of expert gameplay trajectories. Think of it as a playbook for turning generic LLMs into specialized game agents. These datasets span multiple genres, offering a rich training ground for LLMs to adapt and excel.

Here's the relevant code. You can find it on GitHub at krafton-ai/Orak and datasets on Hugging Face. Clone the repo. Run the test. Then form an opinion.

Why You Should Care

So, what's the big deal? Orak's unified evaluation framework includes game leaderboards and LLM battle arenas. It even conducts ablation studies on input modality and agentic strategies. It's like having a laboratory for gaming agents, set to revolutionize how we view AI in gaming.

Is Orak perfect? No. But it's a leap forward. With Orak, the future of AI in gaming doesn't just look promising, it looks inevitable. Ship it to testnet first. Always.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

Orak: The Game-Changing Benchmark for LLMs in Gaming

Introducing Orak

Data-Driven Gameplay

Why You Should Care

Key Terms Explained