Why LMMs Are Struggling with Physics in the Real World
Large multimodal models (LMMs) face challenges when applying learned physical laws to new scenarios. A new benchmark, InPhyRe, puts them to the test.
Large multimodal models, or LMMs, are the tech world's attempt to untangle the complex web of physical laws and visual reasoning. They've been trained to understand concepts like momentum conservation, making them adept at answering questions about potential collision outcomes based on visual data. But is this enough when they encounter scenarios beyond their training?
The InPhyRe Challenge
Enter InPhyRe, the first benchmark designed to evaluate inductive physical reasoning in these models. With over 13 open-source and proprietary LMMs examined, InPhyRe sheds light on their current limitations. The result? LMMs often falter when asked to apply their learned knowledge to new, unseen physical environments.
Why should this concern you? Because LMMs are increasingly considered for roles in safety-critical applications. If they can't adapt their reasoning like humans when faced with new scenarios, can they truly replace human judgment where it matters most?
Where LMMs Trip Up
The analysis from InPhyRe shows three major pitfalls for LMMs. First, they struggle when using their limited parametric knowledge about universal laws in reasoning. Second, their inductive reasoning is weak for scenarios outside their training wheelhouse. And third, language bias often skews their outputs, sometimes ignoring visual cues entirely. If LMMs can't trust their visual inputs, why should we trust their conclusions?
Speed Bumps on the Road to Trust
Let's not sugarcoat it: these models aren't ready for prime time. Their inability to process and adapt to new physical laws raises a red flag for anyone banking on them for critical applications. Are we overestimating their capabilities based on a selective understanding of what they can handle?
While the promise of AI systems like LMMs is tantalizing, the reality is clear. Until they can demonstrate the same level of adaptable reasoning as humans, particularly in safety-sensitive areas, the tech isn't ready to take the wheel. Solana doesn't wait for permission, but AI, we might need to pause and think.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
A standardized test used to measure and compare AI model performance.
In AI, bias has two meanings.
AI models that can understand and generate multiple types of data — text, images, audio, video.
The ability of AI models to draw conclusions, solve problems logically, and work through multi-step challenges.