Jackal Unleashed: The New Frontier in Text-to-JQL Translation
Jackal sets a new standard in converting natural language to Jira Query Language, revealing single-pass LLMs' limitations. Expect a shake-up in how we handle Jira queries.
JUST IN: The world of converting natural language into Jira Query Language (JQL) is getting a major upgrade with the introduction of Jackal. This isn't just any benchmark. It's the first large-scale execution-based text-to-JQL benchmark with a staggering 100,000 validated NL-JQL pairs on a live Jira instance. We're talking about over 200,000 issues at play here.
The Current Challenge
Let's get real. Single-pass large language models (LLMs) have been struggling. They can't figure out which categorical values exist in a specific Jira instance. Nor can they verify queries dynamically against live data. Result? Low accuracy on paraphrased or ambiguous requests. And just like that, the leaderboard shifts.
Among the nine frontier LLMs tested, single-pass models only managed an average execution accuracy of 43.4% on short natural-language queries. That's a wild miss. The text-to-JQL puzzle remains unsolved, a challenge crying for a solution.
Agentic Jackal: The Game Changer
Enter Agentic Jackal, a tool that takes LLMs and amps them up with live query execution via the Jira MCP server. Plus, it's got JiraAnchor, a semantic retrieval tool that nails down those elusive categorical values using embedding-based similarity search. The results? Outrageous improvements. We're talking a 9.0% relative gain on the most linguistically challenging queries for seven out of nine models.
Sources confirm: In a controlled ablation isolating JiraAnchor, we saw categorical-value accuracy leap from 48.7% to 71.7%, with component-field accuracy soaring from 16.9% to 66.2%. That’s not just progress. That’s a revolution in accuracy.
Looking Ahead
The labs are scrambling to catch up. The real meat of the issue isn't just about value resolution errors. It's the semantic ambiguities: disambiguating issue types, selecting the right text fields. That's where the future of this field lies. Who's going to crack the code?
And here's the kicker: Jackal isn't keeping all this under wraps. The benchmark is publicly released, along with all agent transcripts and evaluation code. It's a call to arms for developers and researchers everywhere. Will they rise to the challenge?
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
A standardized test used to measure and compare AI model performance.
A dense numerical representation of data (words, images, etc.
The process of measuring how well an AI model performs on its intended task.
Model Context Protocol (MCP) is an open standard created by Anthropic that lets AI models connect to external tools, data sources, and APIs through a unified interface.