Unpacking Rhetorical Questions in Large Language Models

Rhetorical questions, often used for persuasion rather than information gathering, present a unique challenge for large language models (LLMs). Recent research dives into how these questions are processed internally by LLMs, revealing intriguing findings about their representations.

Early Emergence and Stability

The study highlights that rhetorical signals in LLMs appear early in the processing stages. Notably, they're most stably captured by last-token representations. This suggests that even though rhetorical questions might be complex in their intent, they're linearly separable from straightforward information-seeking questions within datasets. What's significant here? The models can distinguish between rhetorical and informational queries, achieving AUROC scores between 0.7 and 0.8.

Transferability: A Double-Edged Sword

While these models demonstrate the ability to transfer understanding across different discourse contexts, the results show that transferability isn't synonymous with a shared representation. Probes trained on varying datasets yield different rankings when analyzing the same target corpus. The overlap among the top-ranked instances is alarmingly low, often below 0.2. This discrepancy raises questions about the consistency of LLMs across different contexts. If they're not sharing representations, what exactly are they doing?

Divergent Rhetorical Phenomena

Qualitative analyses reveal that these divergences correspond to distinct rhetorical phenomena. Some probes capture the nuanced rhetorical stance embedded within extended argumentation. Others focus on localized, syntax-driven interrogative acts. This suggests that within LLMs, rhetorical questions are encoded by multiple linear directions, each emphasizing different cues. The paper, published in Japanese, reveals a more fragmented internal landscape than one might expect.

Why It Matters

These findings are important for the future development of AI language models. Understanding how rhetorical questions are processed can improve LLMs' ability to interact in more human-like ways. But here's the catch: can we train models to recognize and interpret the subtleties of human discourse, or will they remain forever bound by their training data's limitations? The answer could redefine the boundaries of AI comprehension.