Inside the Battle of Transformers: Which Encoder Reigns...

Inside the Battle of Transformers: Which Encoder Reigns Supreme?

By Isaac TorresJune 5, 2026

Transformer encoders face off in a synthetic and real-data showdown. Spoiler: the classic approach still holds its ground. Let's dissect why.

AI, transformers are like the Swiss Army knives of machine learning. They handle multi-channel scalar signals, embedding several values into a single vector at each step. That's where input encoders come into play. But with eight different types tested, which one's the champ?

The Contenders

We've got a lineup that includes per-channel linear projections, orthogonality regularizers, nonlinear MLP stems, and more. Imagine a synthetic benchmark designed to push the limits of these encoders. And for a reality check, they also faced off on a real-world dataset, ETTh1. The measure? Next-step negative log-likelihood (NLL).

The results? Close calls across the board. The classic nn.Linear(C, d_model) holds its ground, showing near-equivalence with most contenders. It's like the tortoise in the race, steady and reliable. But two encoders fell flat: the shared-scalar baseline, which crumbled under information-theoretic pressure, and the channel-independent PatchTST-spirit baseline, which couldn’t handle the benchmarks.

Why It Matters

Now let's talk practical. If you're in the business of using transformers, stick with nn.Linear(C, d_model) unless you’ve got a compelling reason to switch it up. Why risk the unknown when the tried-and-true delivers?

Look, I talked to the people this affects. Here's what they said: “Why reinvent the wheel when the existing one rolls just fine?” It's like buying a flashy new gadget that promises the world but barely outperforms your old reliable tool. The jobs numbers tell one story. The paychecks tell another.

The Final Take

Ask the workers, not the executives. Automation isn't neutral. In the race of encoders, it’s not always about the latest tech buzz. Sometimes, the classics hold a surprising edge. So, the next time you're coding up a transformer, remember: the productivity gains went somewhere. Not to wages.

For those who want to dive deeper, the code and data for every experiment are available at the project’s GitHub page. It’s transparency in action, something we could use a bit more of in the tech world.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

Inside the Battle of Transformers: Which Encoder Reigns Supreme?

The Contenders

Why It Matters

The Final Take

Key Terms Explained