Bridging the Language Divide: Can Code-Mixing Bring Parity?
Large language models show striking disparities when handling English versus Indian languages. The IndiKLAR benchmark reveals code-mixing could bridge this gap.
Large language models have become remarkably adept at recalling knowledge in English. However, when the same queries are posed in less-resourced languages, particularly Indian languages, the models falter. This inconsistency remains underexplored, especially given India's linguistic diversity.
The IndiKLAR Benchmark
IndiKLAR, an Indic extension of the KLAR-CLC benchmark, enters the scene with a focus on this crosslingual gap. Covering 18 of the 22 scheduled Indian languages, it pairs them with code-mixed variants for 11 language pairs. Native speakers verify both monolingual and code-mixed versions, offering a three-way alignment among English, native, and code-mixed inputs.
The numbers tell a different story. When evaluated across nine open-weight models, the accuracy gap between native languages and English can reach about 0.50. However, code-mixed inputs significantly close this gap, bringing performance within about 0.05 of English without tweaking the model.
Exploring Prompting Strategies
To tackle this disparity, several prompting strategies are tested. There's a two-stage translate-then-answer setup, a one-stage joint translation-and-answer prompt, and Translate-in-Thought (TinT). TinT is intriguing, it allows the model to internally convert input and output only the final answer.
These strategies reveal a consistent 'flip point', the transition from incorrect to correct predictions. This point lies between native and code-mixed inputs. Whether induced by input form or internal model conversion, this flip point suggests code-mixing has potential.
Why It Matters
Here's what the benchmarks actually show: Code-mixing can serve as a powerful bridge, narrowing the gap without major overhauls. But why should we care? In a country as linguistically diverse as India, ensuring that AI models can perform consistently across languages is key.
Could code-mixing be the shortcut to linguistic parity in AI models? It's a provocative question. If a single adjustment in input form can deliver near-English performance, the implications for multilingual applications are vast.
The architecture matters more than the parameter count, and IndiKLAR highlights the need for more inclusive model design. This isn't just an academic exercise. It's about making AI accessible and reliable for millions in their native tongues.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
A standardized test used to measure and compare AI model performance.
A value the model learns during training — specifically, the weights and biases in neural network layers.
The text input you give to an AI model to direct its behavior.
A numerical value in a neural network that determines the strength of the connection between neurons.