Can Language Models Truly Forget? Multilingual...

Large language models (LLMs) have shown an uncanny ability to memorize information, including sensitive data that perhaps we'd prefer they forget. This has led to a burgeoning field of research into unlearning methods, aiming to surgically excise specific knowledge from these models without resorting to the costly and time-consuming process of retraining them from scratch. While this area has been largely dominated by English-language studies, researchers are now expanding their focus to multilingual contexts.

Expanding the Unlearning Frontier

Recent efforts have taken the TOFU benchmark, renowned for its work in this space, and expanded it to encompass five languages. By fine-tuning, unlearning, and querying models in various language permutations, researchers have discovered some fascinating insights into how unlearning transfer operates, or fails to. Intriguingly, the ability of a model to ‘forget’ information in languages other than the one it was unlearned in is highly inconsistent. It appears strongest between languages that share scripts and language families.

What does this tell us? The choice of language used for unlearning significantly affects which languages most effectively exhibit forgetting. It seems that unlearning doesn’t wipe the slate clean but rather repositions knowledge into a state of temporary amnesia. It's like erasing a whiteboard with a damp cloth, residual marks remain, often ready to reappear when prompted in certain ways.

The Illusion of Forgetting

Layer-wise analysis of these language models reveals that the shared cross-lingual latent space remains largely untouched in early layers. The real action happens in the later decoding layers, where unlearning primarily operates. This suggests that unlearning is more about superficial suppression than genuine erasure of knowledge. Color me skeptical, but this raises concerns about the effectiveness of these methods in truly safeguarding sensitive data across multiple languages.

Further findings show that with a single inference-time steering direction, much of this suppression can be reversed, recovering as much as 50% of unlearned knowledge in some models, and an eyebrow-raising 90% in others. What they're not telling you is that this reversible nature of unlearning could leave models vulnerable to unintended data exposure, especially in a multilingual setting.

Why This Matters

So, why should we care? In a world where data privacy is key, ensuring that language models can genuinely and securely ‘forget’ sensitive information is critical. The variable success of multilingual unlearning highlights the complexity of this challenge. It raises the question: Are we truly ready to trust these models with sensitive multilingual data? The findings suggest we tread carefully, aware that the semblance of forgetting might just be an illusion, easily dispelled.

Can Language Models Truly Forget? Multilingual Unlearning Put to the Test

Expanding the Unlearning Frontier

The Illusion of Forgetting

Why This Matters

Key Terms Explained