RoleCDE: Challenging the Role-Playing Limits of Language...

Role-playing agents (RPAs) have become a staple for steering large language models (LLMs) towards behavior that's consistent with predefined roles. Until now, existing benchmarks focused on surface-level fidelity, failing to offer insights into how these models handle conflicts between role-specific values and alignment constraints. Enter RoleCDE, a groundbreaking benchmark designed to fill this gap.

RoleCDE: A New Benchmark

RoleCDE isn't your typical evaluation tool. It presents cognitive dilemma scenarios that force LLMs to navigate structured conflicts, evaluating not just role-scenario grounding but also value conflict resolution and decision tendencies. This isn't small scale either. RoleCDE covers approximately 8,000 diverse role profiles and scenarios, with nearly 24,000 dilemma instances distributed across three difficulty levels and eight role categories.

The 'Role Value Decoupling' Phenomenon

Testing several mainstream LLMs, RoleCDE reveals a consistent pattern: the 'Role Value Decoupling' phenomenon. Here, RPAs often default to decisions that align with morality and alignment-consistent values over role-specific ones when the two clash. This tendency remains largely unchanged by the dilemma's difficulty but varies significantly across different role categories. So, what's the takeaway? Are LLMs too alignment-biased for nuanced role-playing?

Fine-Tuning for Better Balance

RoleCDE doesn't just expose shortcomings, it offers solutions. Fine-tuning based on RoleCDE scenarios effectively mitigates this decoupling, enhancing the agents' ability to reason through trade-offs between conflicting values. Crucially, this process maintains general role-playing fidelity and overall reasoning performance. The paper's key contribution: providing a benchmark that not only identifies but also addresses the limitations of current LLMs in role-consistent decision-making.

With code available atRoleCDE on GitHub, the toolkit is open for researchers to explore and improve. The ablation study reveals further potential for refining role-play capabilities in LLMs.

In a field rapidly evolving, RoleCDE sets a new standard for evaluating and improving RPAs. But it raises a essential question: Can LLMs ever fully embody role-specific values without sacrificing alignment principles? This benchmark initiates the conversation, but the journey is just beginning.

RoleCDE: Challenging the Role-Playing Limits of Language Models

RoleCDE: A New Benchmark

The 'Role Value Decoupling' Phenomenon

Fine-Tuning for Better Balance

Key Terms Explained