Cracking the Code of Cross-Lingual Consistency in...

Large language models (LLMs) often boast impressive world knowledge, yet their performance across different languages falls short. This phenomenon, known as cross-lingual factual inconsistency, presents a major hurdle. However, a new initiative, PolyFact, aims to address this issue with a multilingual factual QA dataset.

The PolyFact Initiative

The dataset introduced here's no small feat. PolyFact comprises 100,000 Wikidata-grounded facts spread across 12 typologically diverse languages. This initiative provides a significant resource for analyzing and improving LLMs' cross-lingual capabilities.

What can be done to enhance factual recall across languages? PolyFact explores three approaches: light continual pretraining (CPT), supervised fine-tuning (SFT), and a method called Group Relative Policy Optimization (GRPO). The benchmark results speak for themselves. GRPO consistently outperforms SFT, enhancing both cross-lingual consistency and generalization to languages not seen during training.

GRPO Takes the Lead

Why does GRPO excel where others falter? The data shows that GRPO reorganizes multilingual routing in a way that promotes shared cross-lingual representations. It reduces language specialization within the model's architecture, specifically within MLP layers and attention heads. This might sound technical, but the implication is clear: GRPO is paving the way for more universally valid language models.

Light CPT on parallel data, on the other hand, yields limited gains. The takeaway is that simply increasing the amount of data isn't enough. The method of integrating that data makes all the difference. It's a reminder that AI, smarter strategies often outweigh brute force.

Why This Matters

Western coverage has largely overlooked this, but the implications are significant. As AI models become integral to global technology, their ability to function consistently across languages is important. Can we afford to have models that are reliable only in English?

PolyFact sets a new standard for evaluating and enhancing cross-lingual abilities. By releasing their code, models, and dataset, the researchers are providing the tools for others to build on their work. It's a collaborative step forward.

In a space where innovation is relentless, choosing the most effective method for improvement is essential. GRPO's success suggests that the future of LLMs could be one where multilingual capabilities aren't just an afterthought but a fundamental part of their design.

Cracking the Code of Cross-Lingual Consistency in Language Models

The PolyFact Initiative

GRPO Takes the Lead

Why This Matters

Key Terms Explained