Breaking Positional Bias in Dense Retrieval Without Retraining
Positional bias in dense retrieval models is a known issue. New techniques can mitigate this bias during inference, improving retrieval without retraining.
Dense retrieval models have long struggled with positional bias. This is where retrieval performance drops when essential information appears later in a passage. The research by Zeng et al., 2025, highlights this ongoing challenge. However, the latest approaches suggest we can address this bias during inference time without needing to retrain models. It's a promising development for those seeking efficient solutions.
Adapting Attention Calibration
Enter inference-time attention calibration. Originally introduced by Schuhmacher et al., 2026, this technique now finds new life in tackling retrieval bias. By extending it with a strength coefficient, lambda, researchers have found a way to balance between original and fully calibrated attention distributions. This isn't just theoretical. Using three different embedding models on datasets like SQuAD-PosQ and FineWeb-PosQ, they've shown that partial calibration frequently outperforms full calibration. That's practical progress.
Real-World Impact
Let's break this down. A single configuration (basket size of 128, lambda set to 0.5, and 50% layer depth) improved the harmonic mean of nDCG@10 across positional groups for all models on FineWeb-PosQ. Not only does this configuration work without per-model tuning, but it also applies to both-pooled and last-token-pooled architectures. If you're wondering whether these insights transfer to other contexts, the answer is a resounding yes. Testing on PosIR, spanning 10 languages and 31 domains, showed reduced Position Sensitivity Index across various combinations while maintaining or improving nDCG@10. This is the kind of efficiency the industry craves.
Why It Matters
Strip away the marketing and you get techniques that can significantly enhance retrieval effectiveness at scale. Why is this important? In a world where information retrieval efficiency affects countless applications from search engines to personal assistants, every bit of improvement matters. Why settle for less-effective searches when adjustments at inference can make them sharper? The numbers tell a different story now. And it's one of progress.
Looking Forward
So, where do we go from here? The release of the extended codebase at https://github.com/impresso/fair-sentence-transformers offers a starting point for broader adoption. The architecture matters more than the parameter count these improvements. As more developers tap into these tools, expect retrieval systems to become even more performant and fair.
Get AI news in your inbox
Daily digest of what matters in AI.