Rethinking OOD Detection in Vision-Language Models

By Signe EriksenMay 27, 2026

A new framework challenges the text-as-prototype approach in zero-shot out-of-distribution detection. This method focuses on aligning visual prototypes directly, offering a fresh path to improvement.

Out-of-distribution (OOD) detection is essential for ensuring machine learning models can handle unexpected inputs. The recent rise of pre-trained vision-language models (VLMs) offered an innovative approach: zero-shot OOD detection without relying on in-distribution (ID) training data. Traditionally, these methods have leaned on text embeddings of class names as class prototypes.

Challenging the Status Quo

This paper takes a bold stance against the prevailing text-as-prototype method. The authors argue that off-the-shelf textual prototypes fail to align with the optimal visual prototypes. This misalignment creates an intrinsic modality gap that can't simply be overcome with prompt engineering. Their solution? An online pseudo-supervised framework that directly learns class prototypes in the visual feature space.

The Framework Unveiled

Crucially, this new approach employs unlabeled test-time data streams alongside soft predictions from pre-trained VLMs to refine visual prototypes. The paper provides theoretical guarantees for the convergence of its online optimization procedure. These guarantees aren't mere academic exercises. They underpin a method that's empirically shown to achieve state-of-the-art results across various OOD detection setups.

Why Does It Matter?

The paper's key contribution isn't just about achieving better numbers. It's about questioning long-held assumptions in OOD detection. When was the last time we truly scrutinized the reliance on textual prototypes? This work compels us to look beyond the text and explore the visual feature space more aggressively.

But, as with any innovation, the question remains: how readily will the industry adapt to these findings? Will data scientists be willing to challenge their own workflows based on this framework's promise?

Code and data are available at the authors' repository, inviting further exploration and potential adoption by the broader community. As the field moves forward, this paper could be the catalyst needed to rethink the integration of VLMs in OOD detection.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

Rethinking OOD Detection in Vision-Language Models

Challenging the Status Quo

The Framework Unveiled

Why Does It Matter?

Key Terms Explained