Vero: The Unlocked Potential of Visual Reasoning

Building a visual reasoner capable of traversing charts, science, and open-ended tasks has long been the domain of proprietary vision-language models (VLMs). Yet, the veil of secrecy surrounding their development has left many questions unanswered. Enter Vero, a family of fully open VLMs that not only matches but often exceeds existing open-weight models across a lots of of visual reasoning tasks.

Breaking Down the Dataset

Vero-600K is an ambitious dataset crafted from 59 different datasets, culminating in a hefty 600,000 samples. This isn't just a numbers game. By scaling reinforcement learning (RL) data and rewards across six broad task categories, Vero crafts a more comprehensive understanding of visual reasoning. This vast coverage is a key driver behind its ability to outperform other models like Qwen3-VL-8B on 23 out of 30 benchmarks.

However, let's not sugarcoat it, slapping a model on a GPU rental isn't a convergence thesis. The magic lies in how Vero routes task-specific rewards that can handle heterogeneous answer formats. It achieves state-of-the-art performance, improving over four base models on average by 3.7-5.5 points across its VeroEval benchmarks. Impressive, but does it really push the envelope?

The Open-Source Edge

Vero's open-source nature breaks the mold of closed-off RL pipelines with non-public data. All data, code, and models are released, fostering a culture of transparency and innovation. The intersection is real. Ninety percent of the projects aren't. But Vero shows substance, proving that open projects can compete head-to-head with proprietary behemoths.

The systematic ablations conducted reveal something interesting, different task categories provoke distinct reasoning patterns. These patterns don't transfer well in isolation, underscoring the importance of broad data coverage for strong RL scaling. It's a reminder that in the quest for AI supremacy, a narrow focus might not get us there.

What's Next for Visual Reasoning?

Vero may be a watershed moment for open-source visual reasoning, but let's not forget the elephant in the room: inference costs. Show me the inference costs. Then we'll talk. As visual reasoning models scale up, the computational demands rise in tandem. It's one thing to have a model that performs well, and another altogether to have one that performs efficiently.

if the AI can hold a wallet, who writes the risk model? As these systems gain more autonomy, the questions around ethical deployment and risk management loom larger than ever. Vero might be rewriting the playbook on visual reasoning, but the game is far from over.

Vero: The Unlocked Potential of Visual Reasoning

Breaking Down the Dataset

The Open-Source Edge

What's Next for Visual Reasoning?

Key Terms Explained