The Unpredictable Impact of AI-Powered Query Rewriting on Dense Retrieval
New research highlights the erratic outcomes of using prompt-only query rewriting in dense retrieval systems. While it improves performance in some domains, it significantly hinders it in others.
world of AI and dense retrieval systems, a new study sheds light on the unpredictable results of using prompt-only, single-step query rewriting. This approach, where queries are rephrased without any retrieval feedback, is widely used in production Retrieval-Augmented Generation (RAG) pipelines. However, its impact on dense retrieval remains a mixed bag.
Domain-Specific Outcomes
The research, conducted across three BEIR benchmarks with two dense retrievers, reveals starkly different outcomes depending on the domain. For instance, in the finance-focused FiQA dataset, query rewriting led to a notable 9% decrease in nDCG@10 scores, a standard measure of retrieval effectiveness. This suggests that rewriting can actually degrade performance when it replaces well-matched, domain-specific terms with less relevant ones.
On the flip side, the TREC-COVID dataset saw a 5.1% improvement, indicating that in certain cases, rewriting helps align queries with corpus-preferred terminology. However, in the SciFact dataset, no significant change was observed, highlighting the inconsistency of this method.
Understanding the Mechanism
So, what's driving these disparate outcomes? The study points to lexical substitution as a consistent factor. It occurs in 95% of rewrites, affecting performance based on whether the new terms align better or worse with the relevant documents. When rewriting veers queries away from well-suited terms, performance drops. But when it nudges them toward preferred nomenclature, it can enhance retrieval quality.
Interestingly, attempts to selectively rewrite queries showed mixed results. While feature-based gating might mitigate some of the worst regressions, it doesn't consistently outperform a strategy of never rewriting. Even with optimal selection, the gains remain modest, raising a important question: is query rewriting more trouble than it's worth in certain contexts?
The Safer Route
The findings suggest that instead of blindly relying on query rewriting, a more cautious approach could be domain-adaptive post-training. This strategy might be safer, especially when supervision or implicit feedback is available, as it allows for more tailored adjustments to enhance retrieval accuracy.
For businesses and developers employing these systems, this study is a wake-up call. Blind faith in AI-driven rewriting can lead to unpredictable outcomes. The real number to watch? Domain-specific adjustments might be the key to unlocking consistent performance improvements.
Get AI news in your inbox
Daily digest of what matters in AI.