Exposing Multilingual Vulnerabilities in Vision-Language Models
A new benchmark reveals safety weaknesses in vision-language models across diverse languages. Visual prompts exploit vulnerabilities, showing gaps in current protections.
Vision-language models (VLMs) are advancing rapidly, showing prowess across numerous multimodal tasks. However, their safety robustness is under scrutiny, and new research exposes significant weaknesses. The paper's key contribution: a benchmark dubbed MLingualFC, which evaluates the vulnerability of VLMs to jailbreaks using structured flowchart representations across multiple languages.
Multilingual Safety Gaps
MLingualFC takes on the challenge of assessing VLMs like Qwen2.5-VL, Gemma-4, and Pangea under a black-box threat model. The benchmark encodes harmful instructions into flowchart images across five languages: Hindi, Punjabi, Spanish, Romanian, and German. The findings? Stark multilingual safety discrepancies. Latin script languages are especially susceptible, with flowchart-based attacks achieving high success rates. In contrast, languages with non-Latin scripts, like Punjabi, demonstrate lower attack success rates.
This suggests that the issue might lie more in the visual text recognition capabilities of these models than in their safety alignment mechanisms. Could it be that a simple change in script dramatically alters a model's vulnerability? That's a question worth pondering.
Implications for VLM Development
The research highlights a key oversight. Current safety mechanisms in VLMs fail to generalize effectively across different scripts and languages. This isn't just a technical curiosity, it's a major problem for real-world applications aiming to be truly global and inclusive. When safety depends heavily on the script, confidence in these models plummets.
But let's take it further. Is it acceptable for technology that's shaping the future of AI communication to have such glaring blind spots? The industry needs to address these shortcomings decisively. Relying on scripts to define safety boundaries is a flawed strategy. The ablation study reveals that visual encoding of harmful content effectively bypasses safety measures in many languages, raising the stakes for developers and researchers alike.
Looking Ahead
Linking safety to script recognition limits these models' applicability in diverse linguistic contexts. Developers must innovate beyond current safety alignments, ensuring strong protection mechanisms that transcend language barriers. Crucially, the dataset and code for MLingualFC are available for further exploration at https://github.com/Rishabhpm23/MLingualFC. As the field evolves, tackling these vulnerabilities head-on will be vital for the future of safe, multilingual VLM technologies.
Get AI news in your inbox
Daily digest of what matters in AI.