Redefining Edits: Autoregressive vs Masked Diffusion Models
Knowledge editing in language models sees new challenges as it moves from autoregressive to masked diffusion models. Here's how edits play out differently.
Knowledge editing in language models is a complex but important task. It involves updating or correcting factual knowledge within the models. Traditionally, this has been done using the locate-then-edit method in autoregressive models (ARMs). But what about masked diffusion models (MDMs), which generate text by iterative denoising? That's the latest challenge researchers are tackling.
The ARM Dominance
Let's start with the current landscape. Autoregressive models like LLaMA and Qwen have set the standard for knowledge editing. They work by predicting the next token in a sequence, making them ideal for precise edits. The locate-then-edit method has been their forte, easily localizing a fact and editing it within the model's weights.
But what happens when you strip away the marketing? You find that the architecture matters more than the parameter count. These ARMs have a proven track record of handling single-token edits and even longer, multi-token targets with ease.
Masked Diffusion Models: A New Frontier
Now, enter the masked diffusion models like LLaDA and Dream. They operate differently, modeling text bidirectionally and generating output through iterative denoising. The same locate-then-edit method has been transferred to these models, but the numbers tell a different story.
While the editing location transfers across these paradigms, thanks to causal tracing highlighting the same layers, the outcomes don't. Single-token edits work fine. But as we push for longer targets, MDMs struggle. Why? Because the edited fact has to navigate through partially unmasked intermediate states where the edit was never optimized.
Solutions and Implications
What's the takeaway here? Frankly, it's a call to optimize edits for these transitional states in MDMs. Researchers have already introduced a correction that significantly restores multi-token performance. But this raises a important question: Are MDMs ready to take on ARMs in real-world applications?
For now, ARM's predictability and efficiency still make them the go-to choice for rigorous editing tasks. However, MDMs hold promise if they can overcome these transitional hurdles. As tech evolves, so too will these models, potentially reshaping the future of knowledge editing.
In this tug-of-war between model types, one thing's certain: understanding the intricacies of each architecture is key to unlocking their full potential.
Get AI news in your inbox
Daily digest of what matters in AI.