LLMs Are Stuck in the Past: It's Time to Evolve
The current obsession with post-training LLMs mirrors an outdated paradigm. Why are we stuck in the past? Innovation demands a new approach.
The current landscape for training large language models (LLMs) is eerily reminiscent of a bygone era. We find ourselves tethered to a 'pre-train then fine-tune' methodology, echoing the BERT era's old playbook. This time, it comes with an extensive post-training phase, including Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). The question is, are we just spinning our wheels?
A Trip Down Memory Lane
The history of LLMs is a story of cycles. We've ping-ponged from era to era, yet somehow, we've returned to familiar ground. Task performance still leans heavily on fitting models to in-distribution datasets. If that's the benchmark, what's really changed? Recent empirical comparisons between pre-trained models and those initialized from scratch reveal an unsettling truth. Models post-trained from scratch can hold their own on modern reasoning datasets, even thriving against competitive math and code benchmarks.
If the AI can hold a wallet, who writes the risk model? Slapping a model on a GPU rental isn't a convergence thesis, yet that's where many in the industry seem to be heading. But the real kicker is, neither approach guarantees a generally capable system. So why cling to these methods?
The Limit of Distribution Fitting
Our current methodologies serve primarily as distribution-fitting mechanisms. It's akin to trying to fit a square peg into a round hole. The models perform well on specific tasks because they're tailored for those particular benchmarks. But this doesn't inherently make them adaptable or truly intelligent. If we're serious about developing AI that 'learns how to learn,' we need to break free from these constraints.
Decentralized compute sounds great until you benchmark the latency. The same applies here. Unless we innovate, we're bound to the latency of outdated methods. The industry's infatuation with post-training tweaks is a distraction from the larger goal. The intersection is real. Ninety percent of the projects aren't.
Beyond Predefined Behaviors
If we want LLMs to move beyond being souped-up parrots, the focus needs to shift. The training procedures should emphasize a model's ability to learn and adapt, not just perform a set of predefined tasks. This is where the real transformation lies. Show me the inference costs. Then we'll talk about real investment in the future of AI.
So, are we ready to innovate, or will we keep rerunning the same old scripts? For those of us who've endured enough of the BERT era's ghost, it's time to stop reminiscing and start evolving.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
A standardized test used to measure and compare AI model performance.
Bidirectional Encoder Representations from Transformers.
The processing power needed to train and run AI models.
The process of taking a pre-trained model and continuing to train it on a smaller, specific dataset to adapt it for a particular task or domain.