Reimagining Supervised Fine-Tuning: A Fresh Take on...

Reimagining Supervised Fine-Tuning: A Fresh Take on Token-Level Targets

By Signe EriksenJune 10, 2026

A new framework redefines supervised fine-tuning by focusing on token-level target distributions rather than strict one-hot objectives. This approach, known as Target-SFT, shows superior performance across multiple datasets, challenging traditional methods.

Supervised fine-tuning (SFT) often feels like a rigid exercise in fitting square pegs into round holes. The traditional approach maximizes the likelihood of each token, treating it as a one-hot target. But here's the catch: tokens can be noisy, non-unique, or simply misaligned with a model's prior knowledge. That's a problem.

Introducing the Q-target Framework

The team behind this research proposes a fresh perspective. Instead of obsessing over the loss objective, they focus on the target distribution that the loss aims to match. Enter the Q-target framework. It breaks down SFT supervision into two key decisions: how much to trust the observed token, and how to distribute the leftover probability among alternatives. This isn't just a neat trick. it's a strategic overhaul that could change the game.

Why Target-SFT Matters

Target-SFT, an approach based on this new framework, directly builds the training objective from the desired target distribution. The results? Consistent outperformance across ten different reasoning datasets and model settings. That's not something to ignore. This isn't just a tweak. it's a potential shift in how we think about SFT training.

The paper's key contribution is its focus on the target distribution rather than strict adherence to observed tokens. It's a move that acknowledges the complexity of language and the richness of pretrained models. Why settle for a narrow focus when a broader target can yield better results?

The Bigger Picture

This approach opens up a wider search space for SFT objectives. It's a reminder that sometimes, stepping back and rethinking our assumptions can lead to substantial gains. The ablation study reveals that by better allocating probability mass, we can align models more closely with their innate strengths.

So, what's the takeaway? The rigid one-hot target of yesteryear might be on its way out. In its place, a more nuanced, target-based approach that capitalizes on a model's inherent capabilities. This work doesn't just build on prior frameworks. it challenges them to evolve.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

Reimagining Supervised Fine-Tuning: A Fresh Take on Token-Level Targets

Introducing the Q-target Framework

Why Target-SFT Matters

The Bigger Picture

Key Terms Explained