Why AI Still Struggles with Humor: The Comic Conundrum

Understanding humor isn't just about getting the punchline. it's about grasping the nuances that make the setup work. For AI, this is a tall order. Vision-language models (VLMs) are still grappling with interpreting humor, particularly in comics where contradictory narratives play a central role. It's a challenge that highlights AI's struggle with human-like reasoning.

The YesBut (V2) Benchmark

Enter the YesBut (V2), a newly introduced benchmark aimed at dissecting this challenge. Comprising 1,262 comic images from a broad spectrum of multilingual and multicultural contexts, this benchmark serves as a testing ground for VLMs. It features detailed annotations designed to capture the intricate elements of narrative understanding, offering a comprehensive tool to assess AI's capabilities.

Testing these models across four distinct tasks, the study targeted both surface-level content comprehension and the deeper narrative reasoning required to piece together humor from contradictions. Unsurprisingly, the results showed that even the most advanced models still pale in comparison to human performance.

Where AI Falls Short

The findings were stark. Models struggled with visual perception, identifying key elements, comparative analysis, and even suffered from hallucinations, a term denoting when AI produces output not grounded in the input data. These are critical weaknesses that hinder AI's understanding of cultural and creative expressions. This isn't just about missing a joke. It's about VLMs failing to engage with the layers of human communication that are essential for meaningful interaction.

Pathways to Improvement

So, where do we go from here? The research doesn't just outline AI's deficiencies. it also suggests ways to bridge the gap. By exploring text-based training strategies and social knowledge augmentation methods, there's potential to enhance model performance. But here's the catch: if AI struggles to understand humor, how can we trust it to interpret more nuanced cultural narratives? This is the broader question that researchers must grapple with as they aim to develop context-aware models capable of deeper narrative understanding through comparative reasoning.

The AI-AI Venn diagram is getting thicker, and the convergence of technology and human expression remains a complex puzzle. Yet, the pursuit of this understanding is vital. After all, if AI is to truly engage with society, it must first learn to laugh with us, not at us.

Why AI Still Struggles with Humor: The Comic Conundrum

The YesBut (V2) Benchmark

Where AI Falls Short

Pathways to Improvement

Key Terms Explained