SkMTEB: A New Benchmark Elevates Slovak in AI Text Embedding

By Nadia OseiJune 12, 2026

SkMTEB introduces a groundbreaking benchmark for Slovak text embeddings, quadrupling previous work. This effort highlights the inadequacy of existing Slovak models and aims to revolutionize low-resource language applications.

AI, low-resource languages often get the short end of the stick. Yet, if you're interested in text embeddings for Slovak, the game has changed with SkMTEB. This new benchmark covers an impressive 31 datasets across 7 task types, dwarfing previous Slovak efforts by nearly four times. That's not just a step forward. it's a leap.

Multilingual Models Take the Lead

When evaluating 31 embedding models, one truth stood out: large, instruction-tuned multilingual models outperform their Slovak-specific counterparts. Models designed for Natural Language Understanding (NLU) don't transfer well to embedding tasks. This is a serious oversight in model development. If you're slapping a model on a GPU rental and expecting results, think again. The depth and breadth of SkMTEB prove that tailored approaches matter.

Locally Deployable Solutions

In response to the need for more efficient Slovak embeddings, researchers introduced two models:e5-sk-smallwith 45 million parameters ande5-sk-largeboasting 365 million. By trimming vocabulary and fine-tuning Multilingual E5 models, these new versions cut size by up to 62% without sacrificing performance. More impressively, they compete with proprietary APIs in tasks like semantic search and retrieval-augmented generation (RAG).

Why Should You Care?

Here's the kicker: everything is open-source, from the models to the datasets and code. This isn't just about Slovak. It's a roadmap for other under-resourced languages. If you're working in AI, you can't ignore the implications. Is it time to rethink how we approach language models? Absolutely.

Slovak might not dominate global headlines, but the SkMTEB project underscores a growing need for diverse, deployable solutions in AI. The intersection is real. Ninety percent of the projects aren't. This one, though, is a big deal.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

SkMTEB: A New Benchmark Elevates Slovak in AI Text Embedding

Multilingual Models Take the Lead

Locally Deployable Solutions

Why Should You Care?

Key Terms Explained