Bridging the Language Divide: Can Code-Mixing Bring Parity?

Large language models have become remarkably adept at recalling knowledge in English. However, when the same queries are posed in less-resourced languages, particularly Indian languages, the models falter. This inconsistency remains underexplored, especially given India's linguistic diversity.

The IndiKLAR Benchmark

IndiKLAR, an Indic extension of the KLAR-CLC benchmark, enters the scene with a focus on this crosslingual gap. Covering 18 of the 22 scheduled Indian languages, it pairs them with code-mixed variants for 11 language pairs. Native speakers verify both monolingual and code-mixed versions, offering a three-way alignment among English, native, and code-mixed inputs.

The numbers tell a different story. When evaluated across nine open-weight models, the accuracy gap between native languages and English can reach about 0.50. However, code-mixed inputs significantly close this gap, bringing performance within about 0.05 of English without tweaking the model.

Exploring Prompting Strategies

To tackle this disparity, several prompting strategies are tested. There's a two-stage translate-then-answer setup, a one-stage joint translation-and-answer prompt, and Translate-in-Thought (TinT). TinT is intriguing, it allows the model to internally convert input and output only the final answer.

These strategies reveal a consistent 'flip point', the transition from incorrect to correct predictions. This point lies between native and code-mixed inputs. Whether induced by input form or internal model conversion, this flip point suggests code-mixing has potential.

Why It Matters

Here's what the benchmarks actually show: Code-mixing can serve as a powerful bridge, narrowing the gap without major overhauls. But why should we care? In a country as linguistically diverse as India, ensuring that AI models can perform consistently across languages is key.

Could code-mixing be the shortcut to linguistic parity in AI models? It's a provocative question. If a single adjustment in input form can deliver near-English performance, the implications for multilingual applications are vast.

The architecture matters more than the parameter count, and IndiKLAR highlights the need for more inclusive model design. This isn't just an academic exercise. It's about making AI accessible and reliable for millions in their native tongues.

Bridging the Language Divide: Can Code-Mixing Bring Parity?

The IndiKLAR Benchmark

Exploring Prompting Strategies

Why It Matters

Key Terms Explained