Language Models Struggle with Safety in West African Tongues
Large language models falter in safety alignment when handling West African languages. New benchmark highlights significant refusal degradation compared to English.
Large language models face a stark challenge: maintaining safety alignment when dealing with non-English languages. A recent benchmark sheds light on this issue, specifically focusing on West African languages like Yoruba, Hausa, Igbo, and Igala. The findings are concerning.
Benchmarking Linguistic Safety
Enter LSR, or Linguistic Safety Robustness, the first of its kind to systematically measure how these models refuse harmful content across languages. The benchmark employs a dual-probe evaluation, presenting both English and target-language probes to models. The goal? To assess how much refusal behavior is lost in translation. The metric used, Refusal Centroid Drift (RCD), quantifies this loss.
With Gemini 2.5 Flash as the test subject, the results speak volumes. English prompts saw refusal rates around 90%. In stark contrast, West African languages plummeted to 35-55%, with Igala displaying the most alarming drop (RCD = 0.55). Clearly, this is a substantial gap that can't be ignored.
The Stakes Are High
Why should we care? The reality is simple: language models are increasingly integrated into everyday tech, influencing how we communicate and access information. If their safety mechanisms fail in specific languages, entire communities are left vulnerable to harmful content. This isn't just a technical flaw. it's a socio-cultural blind spot.
One must ask: are we prioritizing English too heavily in our quest for AI safety? Strip away the marketing and you get models that aren't truly global. If they can't refuse harmful content effectively across languages, are they really ready for worldwide deployment?
What's Next?
The introduction of LSR into the Inspect AI evaluation framework is a step in the right direction, offering a means to improve and tune models for better cross-lingual performance. However, the journey is just beginning. The numbers tell a different story, one of imbalance that needs addressing before these technologies can be fully trusted.
So, the next time a new shiny model is announced, look beyond the parameter count and ask about its linguistic robustness. The architecture matters more than the parameter count, especially ensuring safe AI interactions for everyone, everywhere.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
The broad field studying how to build AI systems that are safe, reliable, and beneficial.
A standardized test used to measure and compare AI model performance.
The process of measuring how well an AI model performs on its intended task.
Google's flagship multimodal AI model family, developed by Google DeepMind.