Why AI Struggles with Real-World Questions

In a world where AI models are expected to answer our every query, there's a glaring gap between what these algorithms can handle and the messy reality of human communication. Enter HAERAE-Vision, a new benchmark revealing the struggle AI faces when dealing with unstructured questions.

The Korean Community Test

HAERAE-Vision dives into 653 real-world questions sourced from Korean online communities, a tiny fraction (0.76%) of the 86,000 initial candidates. Each question pairs with a clarified version, resulting in 1,306 total queries. This setup puts 39 vision-language models (VLMs) to the test, including the much-touted GPT-5 and Gemini 2.5 Pro. The results? Underwhelming, with even top models scoring below 50% on the original queries.

Now here's where things get interesting. By merely clarifying queries, model performance leaped by 8 to 22 points, with smaller models reaping the most benefit. It turns out that a little clarity goes a long way. But why is this important?

The Reality of Human Queries

The crux of the matter is that users often leave much unsaid, relying on images and implicit context to fill in the blanks. In practice, users might assume that AI can 'read between the lines,' but that's not the case. What HAERAE-Vision shows us is that current AI retrieval systems can't make up for what users don't articulate.

Take a moment to consider how often you rely on context when you ask questions. Informal queries are the norm, not the exception, yet our AI systems lag behind this reality. And isn't that a problem if we're leaning more on these technologies?

The Bigger Picture

The story looks different from Nairobi. While Silicon Valley designs these models, the question is where they truly work. In emerging markets, where local languages and informal dialects reign, these findings highlight a important gap. Automation doesn't mean the same thing everywhere, and AI, it seems there's a lot of ground left to cover.

Ultimately, if AI is to be truly useful, it needs to adapt to the way real people communicate. It's not just about improving models, it's about understanding and integrating into the local context. Until then, we're left with a stark reminder that benchmarks aren't the be-all and end-all for AI capabilities.