Rethinking Reward in Reinforcement Learning with...

Rethinking Reward in Reinforcement Learning with Terminal Representation

By Signe EriksenJune 1, 2026

A new approach in reinforcement learning, Terminal Representation, addresses the computational overhead of existing methods. It simplifies the process by bypassing eigenvector computations, promising efficiency without sacrificing performance.

Reinforcement learning (RL) has long relied on representation learning for efficient decision making, with the successor representation (SR) and default representation (DR) being popular choices. These methods encode states based on future trajectories and rewards, respectively. However, a new contender, Terminal Representation (TR), promises to shake things up.

The Terminal Representation Advantage

The TR offers a fresh take by encoding reward-weighted trajectories but with a twist. Unlike its predecessors, it can be learned as a lower-dimensional object. The key benefit here? It removes the computationally expensive step of eigenvector calculations. This alone is a major shift in areas like option discovery and transfer learning.

Why does this matter? In the fast-paced world of RL, efficiency is everything. Less computational overhead means faster processing and the potential for real-time applications. But the question remains: can TR truly replace the well-established SR and DR?

Bypassing Symmetry Assumptions

The reliance on eigendecomposition in SR and DR comes with an assumption of symmetric transition dynamics. TR sidesteps this limitation. By not depending on this constraint, TR provides more flexible and potentially more accurate modeling of environments where such symmetry doesn't naturally occur.

This builds on prior work from the field of RL, showing that assumptions can often limit the applicability of algorithms. With TR, researchers can explore environments previously too complex or asymmetrical for older methods.

Theoretical Foundations and Practical Implications

The paper's key contribution lies in developing the theoretical groundwork for TR. It outlines its derivation, algorithm convergence, and equivalences in reward formulations. The ablation study reveals that TR can embed itself in the top DR eigenvector, capturing the same knowledge without the heavy computational lifting.

But, as with any theoretical advancement, empirical testing is essential. The results are promising, showing TR as a viable alternative with less strain on computational resources. Yet, one can't help but wonder: will TR become the new standard, or is it simply another tool in the RL toolbox?

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

Rethinking Reward in Reinforcement Learning with Terminal Representation

The Terminal Representation Advantage

Bypassing Symmetry Assumptions

Theoretical Foundations and Practical Implications

Key Terms Explained