Why TESSY Could Be a breakthrough for AI Model Training
Synthetic data generation often fails to enhance reasoning in AI models. But a new framework, TESSY, shows promise by aligning stylistic elements with learning targets, leading to significant performance improvements.
In the quest for superior AI models, synthetic data generation has been a popular strategy. However, emerging reasoning models like Qwen3-8B, this approach often backfires. Instead of enhancing abilities, it can lead to notable performance declines. So, what's the problem?
The Problem with Synthetic Data
It turns out that the disconnect between the style of data generated by stronger models (teachers) and the target models (students) significantly hampers the effectiveness of supervised fine-tuning (SFT). The numbers tell a different story: when the Qwen3-8B model was fine-tuned using data from the GPT-OSS-120B, its performance dropped by 3.25% on LiveCodeBench-Pro and a staggering 10.02% on OJBench.
The reality is, style matters more than we thought. The divergence between teacher-generated data and student model distributions is a major roadblock. And it's not just a minor stylistic quirk. it's a fundamental issue impacting performance.
Introducing TESSY: A Fresh Approach
Enter TESSY, a Teacher-Student Cooperation Data Synthesis framework that offers a novel solution. Instead of merely handing down data from teacher to student, TESSY interleaves the models to alternately generate style and non-style tokens. This dual approach allows synthetic sequences to retain the advanced reasoning capabilities of the teacher while aligning stylistically with the student.
The results are impressive. TESSY managed to improve Qwen3-8B's performance by 11.25% on LiveCodeBench-Pro and 6.68% on OJBench. That's a substantial turnaround, especially in an area where improvement was elusive.
Why TESSY Matters
Why should we care about TESSY? Because it underscores a critical insight: the architecture matters more than the parameter count. By focusing on the synthesis process, TESSY provides a framework where AI models can learn more effectively, potentially reshaping how we think about model training.
But here's the big question: will other models follow suit? As AI continues to evolve, adopting frameworks like TESSY might not just be an option, but a necessity. As we strip away the marketing, what remains is a clear vision of how blending style with substance can lead to real progress in AI training.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
The process of taking a pre-trained model and continuing to train it on a smaller, specific dataset to adapt it for a particular task or domain.
Generative Pre-trained Transformer.
A value the model learns during training — specifically, the weights and biases in neural network layers.
The ability of AI models to draw conclusions, solve problems logically, and work through multi-step challenges.