MerLean-Prover: Lean4's New Power Player in Theorem Proving

JUST IN: MerLean-Prover is here, and it's changing the game for theorem proving in Lean4. This isn't just another upgrade. It's a significant leap forward that makes the established players look like they're sitting at the kids' table.

A Leap Beyond Sorry Declarations

The new kid on the block, MerLean-Prover, ditches those pesky sorry declarations for kernel-checkable proofs. It deploys a trio of agents, Planning, Check, and Lean, wrapped in a recursive outer loop focused on the proof plan itself. What's wild? There's no fine-tuning or custom RL objectives here. They didn’t even bother with theorem-specific scaffolding. This is a lean machine, through and through.

Benchmark Beater

On FormalQualBench, a benchmark with 23 PhD-qualifying-exam theorems, MerLean-Prover nailed 10 out of 23. That’s a step up from the previous open-source champion, OpenGauss, which managed only 8. And Putnam2025, MerLean-Prover didn’t just meet expectations. It smashed them, closing 12 out of 12 problems faster than any other system that's tackled the full set.

And just like that, the leaderboard shifts. The labs must be scrambling, seeing their benchmarks fall like dominoes. Who saw that coming?

The Harness Factor

So what's the secret sauce? The design of the harness holds the key. Sure, raw model capability matters. But MerLean-Prover's results suggest a less complicated harness can still punch above its weight. Smaller models like Sonnet and Haiku also got in on the action, solving their respective problems on FormalQualBench with surprising ease.

Is the harness design really the unsung hero here? It sure looks that way. The simplicity of MerLean-Prover's setup is a bold statement: effective theorem proving doesn't need to be a convoluted mess.

Why This Matters

Why should we care? Because this isn't just about ticking off some theorems. It's about redefining what's possible with AI in mathematics. As these tools become more refined, who knows how far they can push the boundaries? The potential applications in education, research, and beyond are massive.

And if a leaner, meaner approach like MerLean-Prover can outperform established methods, perhaps it's time to rethink our strategies. Are we overcomplicating AI development? This release suggests we might be.