PromptPrint: Unmasking Identity through Short-Form Texts
PromptPrint explores how brief interactions with large language models can reveal a user's distinct linguistic signature. The study challenges traditional authorship attribution by focusing on prompts rather than long-form texts, highlighting important implications for security and privacy.
In the area of digital interactions, the notion of authorship attribution has traditionally zeroed in on lengthy, expressive pieces. However, PromptPrint, a groundbreaking study, shifts the spotlight to the brief, task-oriented prompts used with large language models (LLMs). The paper, published in Japanese, reveals something intriguing: do these short prompts still carry a unique, identifiable fingerprint of their author?
Findings: Lexical Stability and Identity
The study draws on an impressive dataset of 20,680 prompts from 1,034 users. PromptPrint establishes three major findings. Firstly, lexical representations outperform semantic encoders, endorsing the 'lexical stability hypothesis.' Essentially, a user's identity is more tightly bound to the choice of words rather than the underlying intent. Western coverage has largely overlooked this aspect, perhaps due to an overemphasis on semantics over syntax in AI discussions.
Exploring the Uniqueness-Consistency Paradox
Secondly, the study highlights a 'uniqueness-consistency paradox.' Users display distinctiveness across a broad spectrum, yet they exhibit inconsistency when varying contexts are considered. This raises a compelling question: how do we balance individuality with adaptability in digital behaviors?
Security and Privacy Implications at Scale
Finally, the study's adversarial analysis uncovers a vulnerability spectrum. Identity signals withstand minor lexical perturbations, but they falter significantly when faced with semantic paraphrasing. The benchmark results speak for themselves, showing that prompt-based identity can effectively serve as a behavioral biometric. This has profound implications for security and privacy, especially when considering potential misuse or surveillance in digital platforms.
As PromptPrint articulates a new perspective on user modeling in LLM interactions, it poses a direct challenge to existing notions of privacy and security in the digital age. With data and code set to be released upon the acceptance of their work, the study’s impact is poised to extend far beyond academic circles.
Get AI news in your inbox
Daily digest of what matters in AI.