Why Large Multimodal Models Struggle with New Physics

Artificial intelligence has made impressive strides lately, especially with large multimodal models (LMMs) that can process visual data and answer complex questions. But there's a catch: these models are hitting a wall adapting to new physical environments. Enter InPhyRe, a groundbreaking benchmark designed to test LMMs on their ability to reason about physical scenarios they haven't seen before.

The Challenge of New Physics

While LMMs are trained with a lot of physical laws, their capabilities fall short when they encounter scenarios that weren't part of their training. Imagine asking your AI assistant to predict the outcome of a novel collision event. If that event follows a set of physics it hasn't learned, the AI's guesswork could be way off. In contrast, humans can adapt and learn from new examples quickly, a skill that's currently out of reach for most LMMs.

The builders never left the scene in AI development, but their creations seem to miss the mark inductive reasoning. InPhyRe is exposing this weakness by providing a visual question-answering benchmark that goes beyond typical parametric knowledge. It evaluates how LMMs handle collisions in synthetic videos, a test that's not just academic but key for real-world applications.

Exposing LMMs' Blind Spots

After putting over 13 LMMs, both open-source and proprietary, through the InPhyRe test, the results weren't exactly encouraging. First, these models struggle to apply even the universal physical laws they do know. Second, when new physical laws come into play, their inductive reasoning falters. Lastly, there's a troubling tendency for these models to rely on language cues more than visual data, raising questions about their reliability in processing visual inputs.

This is what onboarding actually looks like in AI research, where models need not just more data but a deeper understanding of it. If LMMs are to replace human agents in critical tasks, their reasoning needs a serious upgrade. Floor price is a distraction. Watch the utility of these models as they evolve.

Why Should We Care?

So, why should the average tech enthusiast or AI developer care about this? Well, consider the implications for safety-critical applications. Whether it's autonomous vehicles or industrial robotics, the ability to reason through new physics isn't just a nice-to-have. It's essential. Without it, we risk deploying systems that can't handle unexpected events, potentially leading to costly or even dangerous mistakes.

Gaming is AI's best Trojan horse, not just for entertainment but for advancing AI's reasoning capabilities. As we push these models further, benchmarks like InPhyRe will be key in ensuring that they're not just smarter, but wiser too. The meta shifted. Keep up.

Why Large Multimodal Models Struggle with New Physics

The Challenge of New Physics

Exposing LMMs' Blind Spots

Why Should We Care?

Key Terms Explained