Why LLMs Struggle with TLA+ Specs: A Deep Dive

Let's talk about TLA+, the formal specification language that giants like Amazon and Microsoft rely on. It's a powerhouse for industrial verification but translating natural language into TLA+ is no walk in the park. Expertise is needed, and that's a bottleneck. Enter LLMs, or Large Language Models, which are supposed to be the future of everything. But are they here?

The Current Landscape

A new study tested 30 LLMs to see if they could turn plain language into TLA+ specs. They looked at 205 TLA+ specifications using various models and strategies. The results? Only 26.6% got the syntax right, and a mere 8.6% were semantically correct. It's a rough scene. If you're betting on LLMs to make TLA+ easier, you're on the wrong horse, at least for now.

Big Doesn't Mean Better

Here's a twist: size doesn’t matter. DeepSeek r1:8b, a smaller model, outperformed its much larger sibling, the 70B variant. It's not about how big your model is but how well it aligns with reasoning in formal languages. Code-specialized models also faltered. They're bogged down by biases from mainstream language training, showing that specialization without precision is a dead end.

Hallucinations and Biases

Five types of hallucinations cropped up, all tied back to training data issues. It's like asking a painter specialized in landscapes to make technical blueprints. The results won't cut it without expert oversight. Should we be surprised? LLMs aren't magic wands. They need guidance and, frankly, a reality check.

Why This Matters

If you're in the tech world betting on LLMs to simplify your TLA+ needs, think again. These models still require expert eyes to ensure reliability. How long until LLMs can stand alone? If you haven't started aligning your expectations with reality, you're late.

The authors of the study have released their framework, code, and dataset. That's a call to arms for researchers aiming to bridge the gap. But until something fundamental shifts in how these models are trained, don't ditch your TLA+ experts just yet.