Llamion: A Leap Forward in Language Models

In the swiftly evolving field of language models, Llamion emerges as a formidable contender. Released as a 14 billion parameter model, Llamion is crafted by transforming the Orion-14B model into a format consistent with the Llama architecture. It employs an advanced methodology known as Efficient Knowledge Preservation for Transformation (KEPT).

Breaking Down KEPT

KEPT is a sophisticated recipe comprising three elements. Normal Parameter Mapping (NPM) handles unchanged modules, while Optimized Parameter Mapping (OPM) introduces a LayerNorm-to-RMSNorm initialization. This initialization is designed to function optimally in near-zero-mean activation conditions induced by weight decay. But perhaps the most intriguing aspect is the Cross-architecture Knowledge Distillation (XKD), which aligns the converted model's outputs with the source model's across diverse input distributions.

Performance That Speaks Volumes

The benchmark results speak for themselves. Llamion faithfully mirrors the behavior of Orion on H6, MT-Bench, and notably, KoMMLU. With a staggering 66.87% on KoMMLU, Llamion-Base surpasses the nearest competitor on the Open Ko LLM Leaderboard by over 7 percentage points. This was achieved using approximately 123 million tokens processed on a single A100 GPU in just four days. Such efficiency in achieving high performance isn't just commendable, it's transformational.

Preserving Unique Capabilities

What the English-language press missed is the model's ability to retain functionalities that weren't originally part of the transfer corpus. Llamion continues to handle Python programming and 200K-token contexts flawlessly. This resilience during architectural transition is rare and points to the robustness of the transformation process.

Why Llamion Matters

Why should researchers and developers care about Llamion? Because it sets a new benchmark for transforming models with minimal resource expenditure while maintaining, even enhancing, performance. In a domain where computational costs are a constant concern, Llamion's achievements might herald a new era of efficient model transformation. Three checkpoints, Base, Chat, and LongChat, are now available through the Hugging Face Transformers library. They promise to be invaluable for developers keen to experiment with latest AI without prohibitive computational demands.

Is this the dawn of a new standard for AI development? Llamion's impact on how we transform and optimize language models could be profound. As the AI community continues to strive for efficiency and performance, Llamion's innovative approach sets a compelling precedent.