Why Multilingual Model Merging Falls Short

By Nadia OseiApril 6, 2026

Merging fine-tuned models without original data is tempting but often fails in multilingual contexts. Let's unpack why neurons resist merging magic.

Weight-space model merging might sound like a dream, combining independently fine-tuned models without the hassle of original training data. Yet, multilingual machine translation, this approach hits a wall. So what gives?

The Multilingual Misstep

Fine-tuning language models on bilingual corpora and then merging them with standard strategies might work in multitask environments. But in multilingual translation, the outcome is less than stellar. Our experiments show that merging degrades performance significantly when target languages diverge. Which begs the question: why are multilingual contexts so resistant?

Neurons Aren't Cooperating

To crack this puzzle, we dove into the neural networks' internal representations. Using span-conditioned neuron selectivity and layer-wise centered kernel alignment, we found something intriguing. Neurons specific to each language tend to cluster in the embedding layers and upper transformer blocks. Intermediate layers, however, stay largely shared. The rub? Fine-tuning redistributes language selectivity.

Instead of honing in on language-specific neurons, fine-tuning makes them less exclusive for supervised and related languages. Meanwhile, neurons for unsupervised languages grow more isolated. This redistribution amplifies divergence in higher layers, precisely where generation is governed. Not exactly the recipe for merging success.

Geometry of Fine-Tuning

What does this mean for weight-space merging? The geometry of fine-tuning reshapes language model architecture in ways that undermine compatibility with merging assumptions. The shared layers aren’t doing the heavy lifting you’d expect. So, if you're banking on merging to solve your multilingual model woes, you might want to reconsider.

Slapping a model on a GPU rental isn't a convergence thesis. It's time we accept that multilingual fine-tuning requires more nuanced strategies than the blunt instrument of merging. Show me the inference costs. Then we'll talk about innovation.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

Why Multilingual Model Merging Falls Short

The Multilingual Misstep

Neurons Aren't Cooperating

Geometry of Fine-Tuning

Key Terms Explained