Bridging the Indic Gap: The New Benchmark for Text-to-SQL Performance
IndicDB exposes the limitations of current language models in non-Western contexts, revealing a significant drop in performance across Indic languages.
Large Language Models (LLMs) have undeniably transformed the field of Text-to-SQL, but let's apply some rigor here. The current benchmarks are woefully inadequate non-Western applications. Enter IndicDB, a newly introduced benchmark that promises to shake up the status quo by evaluating cross-lingual semantic parsing in a diverse array of Indic languages.
The Rigor of Realism
IndicDB isn't your run-of-the-mill benchmark. It's built on relational schemas drawn from open-data platforms like the National Data and Analytics Platform (NDAP) and the India Data Portal (IDP). This approach ensures that the complexity of real-world administrative data is captured, offering a more rigorous evaluation environment. Comprising 20 databases and 237 tables, IndicDB boasts a relational density of 11.85 tables per database, with join depths reaching up to six.
To achieve this, an iterative three-agent framework, comprising an Architect, an Auditor, and a Refiner, was employed to convert denormalized government data into rich relational structures. This isn't just about numbers. it's about ensuring structural rigor. But what they're not telling you is how this level of complexity might actually intimidate existing models.
The Indic Gap Exposed
IndicDB sets the stage for evaluating models like DeepSeek v3.2, MiniMax 2.7, LLaMA 3.3, and Qwen3 across English, Hindi, and five other Indic languages. The results? A glaring 9.00% performance drop when models shift from English to Indic languages, a phenomenon now dubbed the "Indic Gap." This gap isn't just a minor discrepancy. It highlights significant challenges in schema linking, increased structural ambiguity, and a dearth of external knowledge resources.
This isn't merely an academic exercise. If LLMs are to fulfill their promise of global applicability, they need to bridge this gap. Why are we accepting a world where language models are effective in some regions and not in others? The "Indic Gap" is a wake-up call.
The Path Forward
IndicDB isn't just a benchmark. It's a call to action. The significant dip in performance across Indic languages is a reminder that the AI community has some catching up to do. It's tempting to rest on the laurels of increased Text-to-SQL performance in Western contexts, but that doesn't survive scrutiny in a globalized world.
So, what's next? To be fair, models need to adapt to diverse linguistic structures and expand their external knowledge bases. This isn't just about solving a technical problem. it's about inclusivity and ensuring that all languages are treated with the same level of rigor and importance.
In the grand scheme of things, IndicDB serves as more than just a performance benchmark. It's a new benchmark for fairness in AI development. Are we up to the challenge?
Get AI news in your inbox
Daily digest of what matters in AI.