Why Omni LLMs Can't Keep Us Safe Just Yet

By Lexi TanakaJune 6, 2026

MCBench unveils the struggles of Omni Large Language Models in safety-critical assessments. They fumble with subtle risks, lacking solid cross-modal reasoning.

Omni Large Language Models are the hot topic of AI circles these days. They promise to process vision, audio, and text all at once. But safety, these models are stumbling. Enter MCBench, a benchmark designed to expose their flaws. With 1196 scenarios across four safety categories, MCBench is putting these models to the test, and the results aren't pretty.

Behind the Scenarios

MCBench isn't just throwing random dangers at these models. It pairs each unsafe scenario with a safe counterpart that's barely different. The goal? To see if these models can truly tell the difference. Spoiler: They often can't. While state-of-the-art models shine when cues are obvious, they struggle with risks that aren't as blatant.

The Big Struggle

What does this all mean? Omni LLMs are having a hard time integrating cues from different modalities. Sure, they can pick up on specific details, but piecing them together to make sound safety judgments, they're falling short. It's like giving someone all the ingredients for a cake but no recipe to follow. You can't expect a masterpiece if the pieces don't come together.

What Needs to Change

The research is clear: current Omni LLMs aren't cut out for safety-critical tasks. This isn't just about tweaking a few algorithms. We need new architectures and training strategies. The AI community must shift focus if we want these models to be genuinely reliable.

This brings us to a critical question: Can we trust AI with our safety just yet? The answer, for now, seems to be no. If nobody would play it without the model, the model won't save it. We need to push for better integration of modalities and a deeper understanding of what safety truly entails.

The bottom line? Omni LLMs might be the future, but they're not ready for prime time safety. Until they can figure out how to effectively combine the data they're fed, they're just another play-to-earn that forgot the play part. And in this case, the stakes are way higher than a leaderboard.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

Why Omni LLMs Can't Keep Us Safe Just Yet

Behind the Scenarios

The Big Struggle

What Needs to Change

Key Terms Explained