Streamlining Multi-hop QA with Cost-effective Routing
RASER introduces efficient routing in multi-hop question-answering systems, slashing token costs without sacrificing accuracy. It challenges the need for costly retrieval on every question.
Multi-hop question-answering (QA) systems often suffer from high costs due to repeated calls to large language models (LLMs). The typical approach involves decomposing questions, executing several retrieval rounds, or exploring bridge entities for accurate answers. This strategy, however, can be prohibitively expensive when the LLM budget is limited. Is there a more efficient way?
RASER: A New Approach
Enter RASER (Recoverability-Aware Selective Escalation Router), a family of routers that promises to simplify the process. Instead of triggering retrieval on every question, RASER leverages the power of a single one-shot retrieval-augmented generation (RAG). Our analysis indicates that a significant number of multi-hop questions don't require multiple retrieval steps and can be answered correctly with this simplified approach.
RASER-2 and RASER-3, the two configurations, operate without additional LLM calls. RASER-2 decides whether a question needs further retrieval or not, while RASER-3 introduces a cost-accuracy trade-off, selecting between one-shot RAG, the extra-retrieval action PRUNE, and iterative retrieval IRCoT.
Efficiency Without Compromise
The key finding here's that both routers maintain competitive F1 scores with state-of-the-art (SOTA) baselines, yet they manage to do so while using only 41-49% of the tokens that always-prune would consume. This positions RASER as a cost-effective solution that doesn't compromise on accuracy.
So, why does this matter? In a field where reducing computational cost is important, RASER offers a path forward. It challenges the notion that more retrieval equals better performance, showing that smarter routing can achieve the same results with less expense.
The Future of Multi-hop QA
With the increasing reliance on LLMs, it's vital to address budget constraints without losing the reliability of answers. RASER's approach could redefine how systems handle complex queries, especially in resource-limited environments. However, the real test will be its adaptability across different domains and datasets.
This builds on prior work from various retrieval methodologies, but RASER's innovation lies in its simplicity and effectiveness. The paper's key contribution is demonstrating that less can indeed be more in multi-hop question-answering. Are we on the brink of a new efficiency standard in QA systems?
Get AI news in your inbox
Daily digest of what matters in AI.