CoHyDE: The Future of Tool Retrieval in API Catalogs

Tool retrieval in vast API catalogs is a headache for LLM agents. User queries tend to be in everyday language, often vague, while API catalogs speak in technical jargon. This mismatch is a bottleneck that existing training approaches struggle to solve.

Existing Methods Fall Short

Two main approaches dominate the field: contrastive encoder fine-tuning and HyDE-style query expansion with a frozen LLM. The reality is, both have glaring weaknesses. The fine-tuned encoder shines when queries already align with catalog terms, but crumbles otherwise. HyDE, on the other hand, handles under-specified queries better but generates hypothetical descriptions that miss the mark when queries are precise.

Enter CoHyDE

Here's where CoHyDE makes its mark. It's an iterative procedure that trains the dense encoder and the LLM rewriter as a cohesive whole. By retraining the encoder with InfoNCE on hypothetical descriptions from the rewriter and aligning the rewriter via DPO against retrieval scores, CoHyDE creates a dynamic co-evolving system. The process starts with both components prepped on the tool catalog, ensuring a focused approach from the get-go.

Strong Results Speak Volumes

On a subset of around 10,000 tools from the ToolBench catalog, CoHyDE outperformed the best baseline by +2.5 percent points on standard queries and +6.3 percent points on vague ones. In the toughest cases, the gains soared to +8 percent points. The numbers tell a different story: co-training is the secret sauce. Isolating either component doesn't come close to CoHyDE's performance, with up to -8 percent points loss on vague queries.

Why This Matters

For developers and companies relying on API catalogs, this is a major shift. Strip away the marketing and you get a system that truly understands and processes queries with higher accuracy. As APIs continue to proliferate, the demand for efficient retrieval systems will only grow. CoHyDE's approach could set the standard, pushing others to rethink how they bridge the gap between everyday language and technical specificity.

But here's the big question: Will other systems catch up or will CoHyDE redefine the landscape for good? The architecture matters more than the parameter count, and CoHyDE's strategy might just be the blueprint others follow.