Can AI Really Think Like Us? The Struggle with Human-Like Decisions
Large language models are impressing with AI decision-making, yet struggle with judgements. Supervised fine-tuning shows promise in mimicking human reasoning.
Large language models (LLMs) were initially the darlings of generative AI. But now, the buzz is about their evolution into agentic AI systems. These systems can make decisions in complex real-world settings. However, while their generative capabilities have dazzled us, their decision-making chops lag behind. They often stumble over exceptions, a key part of any nuanced decision-making process.
Where Do LLMs Trip Up?
LLMs, even those that shine in reasoning tasks, tend to stick rigidly to set policies. This rigidity can lead to choices that are impractical or downright counterproductive. It's not surprising if you've ever fought with a chatbot that just didn't get your situation. The real story here's that these models struggle with exceptions due to the inherent incompleteness of contracts and rules.
The team behind recent research tried three techniques to help AI handle exceptions: ethical framework prompting, chain-of-thought reasoning, and supervised fine-tuning. Spoiler alert: ethical prompting didn't work. Chain-of-thought reasoning offered slight improvement. But it was supervised fine-tuning that stole the show. Especially when models were tuned using human explanations, not just yes-or-no labels.
The Power of Supervised Fine-Tuning
What really stood out was how supervised fine-tuning allowed models to generalize human-like decision-making to brand new situations. This isn't just transfer learning. It's about aligning AI decisions with human judgment across different contexts. Sure, the numbers back this up, but the implications for AI development are what's really fascinating.
If AI can start to make decisions that feel human, the potential is enormous. But let's not get ahead of ourselves. The pitch deck says one thing. The product says another. The models still need significant work to handle exceptions as fluidly as a human would. The founder story is interesting. But the metrics showing AI's struggle with exceptions tell us more about where we need to focus.
Why Should We Care?
So, why does this matter? The future of AI hinges not just on generative capabilities but on decision-making skills that can keep up with human intuition. Without this, we're left with machines that can produce text but not truly understand it. What matters is whether anyone's actually using this in real-world scenarios where judgment counts.
Here's a thought: if supervised fine-tuning with human explanations can really bridge the gap, should we be investing more in this area? The grind of making AI more human-like isn't just about better algorithms. It's about real-world usability, reducing churn, and increasing retention of AI solutions in everyday applications.
As AI continues to evolve, let's not lose sight of the fundamental question: can these systems make decisions like us, or are they forever doomed to be mere calculators with a knack for language? The jury's still out, but the signs are promising.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
Agentic AI refers to AI systems that can autonomously plan, execute multi-step tasks, use tools, and make decisions with minimal human oversight.
An AI system designed to have conversations with humans through text or voice.
The process of taking a pre-trained model and continuing to train it on a smaller, specific dataset to adapt it for a particular task or domain.
AI systems that create new content — text, images, audio, video, or code — rather than just analyzing or classifying existing data.