Redefining Reward: Reinforcement Learning’s Turbulent...

Reinforcement learning has long promised the moon optimizing outcomes in physical control systems. But the reality? Often, there's a yawning gap between what designers want and what these agents achieve. This divergence becomes glaring in applications like drag reduction in wall turbulence.

The Reward Dilemma

The problem starts with rewards. They're intended to guide agents toward desired outcomes. However, physical control, these rewards frequently fall short. The system was deployed without the safeguards the agency promised. For instance, in the context of drag reduction, the rewards linked to pressure gradients can lead to misleading outcomes. Imagine reducing drag but at the cost of increased power usage through the wall. It's a classic case of winning the battle but losing the war.

Two agents have demonstrated significant drag reductions. Yet, the total dissipation increased. So, are these impressive figures just masking a more wasteful flow? The answer, unfortunately, is yes. The documents show a different story behind the reported success.

Tackling the Flaws

Fixing these issues isn't impossible, but it requires a fundamental shift. By introducing a differentiable projection, the credit each agent earns can be restored. This ensures that the policy gradient, a key component in reinforcement learning, functions effectively. Furthermore, adopting a recurrent policy with an expanded sensing range can address the slow cycles near the wall that a memoryless policy can't handle.

But the real big deal is linking rewards to the true power across the wall, not just nominal drag reduction. This approach, while more conservative, creates an honest accounting system. The result? A controller that achieves a respectable 17% improvement without sacrificing efficiency.

The Path Forward

Why does this matter? Because the current trajectory of reinforcement learning in physical systems is unsustainable. The affected communities weren't consulted, and the systems we rely on continue to be deployed with hidden inefficiencies. We can't afford to ignore these discrepancies. Revisiting the core principles of reward and policy in AI systems isn't just a technical necessity, it's an ethical imperative.

So, what's the takeaway? Reinforcement learning systems must be designed with accountability and transparency at their core. This means not just tweaking algorithms but overhauling how we define success in AI. If we don't, we're setting ourselves up for a future where advances in AI come with a steep, hidden cost. After all, accountability requires transparency. Here's what they won't release.

Redefining Reward: Reinforcement Learning’s Turbulent Challenge

The Reward Dilemma

Tackling the Flaws

The Path Forward

Key Terms Explained