Why Large Multimodal Models Struggle with New Physics
Large multimodal models (LMMs) have learned basic physics, but they stumble when facing unfamiliar physical scenarios. A new benchmark, InPhyRe, exposes these limitations.
Artificial intelligence has made impressive strides lately, especially with large multimodal models (LMMs) that can process visual data and answer complex questions. But there's a catch: these models are hitting a wall adapting to new physical environments. Enter InPhyRe, a groundbreaking benchmark designed to test LMMs on their ability to reason about physical scenarios they haven't seen before.
The Challenge of New Physics
While LMMs are trained with a lot of physical laws, their capabilities fall short when they encounter scenarios that weren't part of their training. Imagine asking your AI assistant to predict the outcome of a novel collision event. If that event follows a set of physics it hasn't learned, the AI's guesswork could be way off. In contrast, humans can adapt and learn from new examples quickly, a skill that's currently out of reach for most LMMs.
The builders never left the scene in AI development, but their creations seem to miss the mark inductive reasoning. InPhyRe is exposing this weakness by providing a visual question-answering benchmark that goes beyond typical parametric knowledge. It evaluates how LMMs handle collisions in synthetic videos, a test that's not just academic but key for real-world applications.
Exposing LMMs' Blind Spots
After putting over 13 LMMs, both open-source and proprietary, through the InPhyRe test, the results weren't exactly encouraging. First, these models struggle to apply even the universal physical laws they do know. Second, when new physical laws come into play, their inductive reasoning falters. Lastly, there's a troubling tendency for these models to rely on language cues more than visual data, raising questions about their reliability in processing visual inputs.
This is what onboarding actually looks like in AI research, where models need not just more data but a deeper understanding of it. If LMMs are to replace human agents in critical tasks, their reasoning needs a serious upgrade. Floor price is a distraction. Watch the utility of these models as they evolve.
Why Should We Care?
So, why should the average tech enthusiast or AI developer care about this? Well, consider the implications for safety-critical applications. Whether it's autonomous vehicles or industrial robotics, the ability to reason through new physics isn't just a nice-to-have. It's essential. Without it, we risk deploying systems that can't handle unexpected events, potentially leading to costly or even dangerous mistakes.
Gaming is AI's best Trojan horse, not just for entertainment but for advancing AI's reasoning capabilities. As we push these models further, benchmarks like InPhyRe will be key in ensuring that they're not just smarter, but wiser too. The meta shifted. Keep up.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
The science of creating machines that can perform tasks requiring human-like intelligence — reasoning, learning, perception, language understanding, and decision-making.
A standardized test used to measure and compare AI model performance.
AI models that can understand and generate multiple types of data — text, images, audio, video.
The ability of AI models to draw conclusions, solve problems logically, and work through multi-step challenges.