SimulCost: Redefining Efficiency in Physics Simulations

By Felix NavarroMarch 31, 2026

SimulCost challenges traditional methods by evaluating LLMs in physics simulations. This benchmark reveals the trade-off between speed and accuracy.

Large Language Models (LLMs) have become the linchpin in various scientific tasks, but their efficiency in physics simulations is under scrutiny. Enter SimulCost, a new benchmark that shifts the focus from mere token costs to the more nuanced tool-use costs like simulation time and experimental resources. The AI-AI Venn diagram is getting thicker, and SimulCost is here to map it.

Revisiting Metric Standards

Traditional metrics like pass@k are falling short when faced with real-world budget constraints. SimulCost addresses this by evaluating cost-sensitive parameter tuning in physics simulations across 12 different simulators. This includes 2,916 single-round tasks and 1,900 multi-round tasks, covering fields like fluid dynamics and plasma physics.

Frontier LLMs aren't exactly setting the world ablaze here. In single-round modes, their success rates fluctuate between 46% and 64%. But when accuracy becomes critical, these figures nosedive to a dismal 35-54%. : Are initial LLM guesses reliable for high-precision tasks?

Multi-Round Mode: A Double-Edged Sword

Switching to multi-round mode improves success rates to a more respectable 71-80%, but at what cost? LLMs are 1.5 to 2.5 times slower than traditional methods, rendering them uneconomical. The compute layer needs a payment rail more than ever, efficiency can't be ignored.

SimulCost also digs into parameter group correlations for potential knowledge transfer and evaluates the impact of in-context examples and reasoning effort. These insights aren't just academic. they're practical guidelines for deploying and fine-tuning LLMs.

Open-Source and Extensible

The financial plumbing for machines is incomplete without open-source tools. SimulCost isn't just a static benchmark. it's an extensible toolkit aimed at improving cost-aware agentic designs for physics simulations. It's a stepping stone for new simulation environments.

With code and data available on GitHub, SimulCost invites researchers to innovate. This isn't a partnership announcement. It's a convergence of ideas and technology. As physics simulations evolve, will LLMs adapt quickly enough to justify their cost?

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

SimulCost: Redefining Efficiency in Physics Simulations

Revisiting Metric Standards

Multi-Round Mode: A Double-Edged Sword

Open-Source and Extensible

Key Terms Explained