Enhancing Language Model Reasoning with TRACE

By Signe EriksenJune 10, 2026

TRACE boosts language model reasoning by refining reward contrasts in multi-turn rollouts, raising Qwen3-14B's accuracy by 2.8 points.

Reinforcement learning with verifiable rewards (RLVR) isn't a new concept, but its application in enhancing reasoning within large language models is gaining traction. The challenge? Traditional policy optimization in RL often struggles with reward contrast, especially when faced with overly simplistic or complex prompts.

Introducing TRACE

Enter Tree Rollout Allocation for Contrastive Exploration, or TRACE. This innovative framework doesn't just stop at the root prompts. It extends its budget allocation to intermediate prefixes within multi-turn rollouts. What does this mean? Each ReAct-style thought-action-observation step is treated as its own node. This approach naturally forms tree-structured rollouts, drastically improving the reward contrast.

Crucially, TRACE isn't just about distributing resources. It employs a shared generalizable predictor to estimate the conditional success probability at various anchors from prefix histories. This ensures that the allocation is precise and informed, ultimately enriching outcome-only feedback.

Why TRACE Matters

The key finding here's the efficiency and performance gains achieved with TRACE. On typical agentic benchmarks, TRACE has shown to improve the Qwen3-14B Multi-Hop QA average accuracy by 2.8 points compared to competitive baselines, all while maintaining the same sampling cost.

But why should anyone outside of academia care? Simple. Language models are influencing an increasing number of applications, from chatbots to complex problem-solving AI. Enhanced reasoning capabilities mean better, more reliable outputs across the board. Isn't that something worth investing in?

Looking Ahead

While TRACE marks a significant step forward, it's not the end of the journey. The ablation study reveals areas ripe for further research and refinement. However, the question remains: can TRACE's approach be generalized beyond language models to other AI domains?

Ultimately, TRACE's contribution to refining RL in language models is noteworthy. It builds on prior work by making multi-turn agentic rollouts more efficient and insightful. As AI continues to integrate deeper into daily life, advancements like TRACE will set the pace for what these systems can achieve.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

Enhancing Language Model Reasoning with TRACE

Introducing TRACE

Why TRACE Matters

Looking Ahead

Key Terms Explained