Revamping COVID-19 Literature Search: The Battle of Hybrid Retrieval Systems
A new hybrid retrieval system outperforms existing methods in COVID-19 literature search, leveraging a mix of dense and sparse configurations for better relevance and efficiency.
Searching through vast amounts of scientific literature is like finding a needle in a haystack, but recent advancements in hybrid retrieval systems might just change that. In the space of COVID-19 scientific papers, a new system has emerged, one that's shaking up the benchmarks we've come to expect. Evaluated on the TREC-COVID benchmark, this system combs through 171,332 papers using 50 expert queries, striving for greater precision and speed.
The Power of Fusion
This retrieval system isn't your run-of-the-mill search engine. It implements six different retrieval configurations, blending sparse and dense methods, rank-level fusion, and an innovative projection-based vector fusion approach known as B5. While the rank-level fusion (RRF) configuration leads the pack in relevance with an nDCG@10 of 0.828, the B5 approach isn't far behind, achieving an nDCG@10 of 0.678 on expert queries. But here's the kicker, B5 is 33% faster than RRF, clocking in at 847 milliseconds compared to RRF's 1271 milliseconds. In a field where speed and relevance are critical, this can't be overlooked.
B5 vs. RRF: The Relevance Debate
While RRF might edge out in absolute relevance, B5 delivers the largest gains on keyword-heavy queries, with an 8.8% improvement. This raises a critical question: Is it better to have a system that's marginally more relevant in absolute terms or one that excels in specific, high-value queries at a faster pace? Either way, the hybrid system seems ready to disrupt how we approach scientific literature retrieval.
Speed vs. Diversity
Interestingly, the system also tackles diversity in search results. On expert queries, MMR reranking boosts intra-list diversity by up to 24.5%, albeit at a cost to nDCG@10 by around 25%. But both fusion pipelines remain consistently under the two-second latency target, proving that speed and diversity can coexist, albeit with trade-offs in relevance.
Deployed as a Streamlit web application and backed by Pinecone serverless indices, the system sets a new standard for what a hybrid retrieval system can achieve. The intersection is real. Ninety percent of the projects aren't. Yet when something works, it's bound to set a precedent. If the AI can hold a wallet, who writes the risk model? That's our next question.
Get AI news in your inbox
Daily digest of what matters in AI.