TED: Smarter Knowledge Transfer Without Training

By Mateo ReyesMarch 31, 2026

TED redefines knowledge distillation by focusing on context rather than parameters. This approach improves performance with less data and cost, offering a new path for AI development.

AI, knowledge distillation has often been about cramming teacher model expertise into a student's parameters. But what if you could sidestep the heavy lifting of parameter updates and training data? That's exactly what TED, a new context-based distillation framework, aims to do.

Rethinking Distillation

TED shifts the focus from a student's parameters to an in-context experience. Instead of tweaking parameters, it injects reasoning experiences directly into the prompts. For each input, the student generates multiple reasoning trajectories. Meanwhile, the teacher crafts its solution independently. Then, the teacher compares the student's approaches with its reasoning and the ground-truth answer.

The magic happens when TED extracts generalized experiences capturing effective reasoning patterns. These experiences are continuously refined, but here's the catch: context-based distillation risks endless growth and noise. TED tackles this with an experience compression mechanism, smartly merging, rewriting, or removing low-utility experiences based on usage statistics.

Proven Results

Let's talk numbers. TED's effectiveness shines through in experiments on MathVision and VisualPuzzles, two multimodal reasoning benchmarks. On MathVision, TED boosts Qwen3-VL-8B's performance from 0.627 to 0.702, and on VisualPuzzles, it jumps from 0.517 to 0.561 with only 100 training samples. It's a significant leap without the traditional training burden.

Why This Matters

The demo is impressive. The deployment story is messier. TED achieves performance that's competitive with fully trained models while slashing training costs by over five times. In production, this looks different. It means resource-constrained environments can afford meaningful knowledge transfer, opening new doors for AI development.

But here's where it gets practical. If AI can learn from context without being bogged down by data and training, what stops us from revolutionizing how machines reason? The real test is always the edge cases. TED's approach might just be the key to smarter, more efficient AI systems.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

TED: Smarter Knowledge Transfer Without Training

Rethinking Distillation

Proven Results

Why This Matters

Key Terms Explained