Why LLMs Are Still Lost in a Maze: Spatial Reasoning Woes

By Callum BryceApril 14, 2026

LLMs struggle with spatial reasoning, showing significant performance gaps. Gemini-2.5-Flash leads, yet fails in visual formats.

JUST IN: The spatial reasoning of foundation models is under the microscope, and it's not looking good. Researchers have put these models through their paces using maze tasks, aiming to evaluate their ability to construct internal spatial world models. The results? Let's just say the labs are scrambling.

The Maze Challenge

In a series of experiments involving models like Gemini-2.5-Flash, GPT-5-mini, Claude-Haiku-4.5, and DeepSeek-Chat, significant discrepancies in spatial reasoning became evident. Gemini-2.5-Flash managed to ace smaller mazes with 80-86% accuracy using tokenized adjacency representations. But switch to visual grid formats, and the performance plummets to 16-34%. That's a wild difference, showing these models' spatial reasoning is anything but format-agnostic.

Deeper Problems Uncovered

Here's the kicker: despite achieving 96-99% in semantic coverage within reasoning traces, these models can't take advantage of that understanding spatial computations. Each question is treated independently. They aren't building the cumulative spatial knowledge we'd hope for. So, what's the point of having near-perfect semantic understanding if it doesn't translate to spatial tasks?

Why It Matters

This isn't just an academic exercise. The implications for deploying foundation models in tasks that require spatial abstraction are massive. If you're banking on these models for applications like autonomous navigation or complex planning, think again. The leaderboard in AI might be shifting, but not in the direction some expected.

Ultimately, these findings raise a big question: Are current foundation models fundamentally limited in their ability to develop reliable spatial world models? Until they can consistently handle spatial reasoning across formats, they're more like one-trick ponies than the universal solvers we dreamed of.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.