Can AI Truly Innovate? New Experiments Put LLMs to the Test

The growing excitement around large language models (LLMs) and their potential to drive scientific breakthroughs is well-founded, yet remains shrouded in a important question: can these models go beyond mere replication and truly innovate? Recent experiments have illuminated this quandary with fascinating results, specifically examining whether LLMs can reinvent foundational algorithms from scratch.

Testing the Boundaries of Innovation

In this innovative study, researchers deployed an 'Unlearn-and-Reinvent' strategy. The approach involves first erasing a specific foundational algorithm, like Dijkstra's or Euclid's, from an LLM's pretrained knowledge. The model is then tested to see if it can independently rediscover the algorithm within a controlled setting. According to two people familiar with the negotiations, the study implemented a GRPO-based, on-policy unlearning method across 10 target algorithms, involving three strong open-weight models, each tested under varying degrees of hints.

Results That Turn Heads

The findings are illuminating. The standout model, Qwen3-4B-Thinking-2507, demonstrated strong performance, successfully reinventing 50% of the algorithms independently, 70% with minimal hints, and as much as 90% when given moderate guidance. While this suggests an impressive aptitude for innovation, the failure of even step-by-step hints on more complex algorithms paints a telling picture of current limitations.

the incorporation of test-time reinforcement learning enabled the successful reinvention of the Strassen algorithm when moderate hints were provided. Reading the legislative tea leaves, it's evident these LLMs possess a budding capacity for innovation, yet they continue to struggle with more intricate challenges.

A Critical Role for Generative Verification

One notable discovery was the role of generative verification during the reinvention phase. This proved essential in maintaining the models' reasoning capabilities, effectively preventing what researchers termed a 'thought collapse.' This aspect of the experiments underscores the delicate balance LLMs must maintain to innovate, hinting at the future potential for AI-driven discoveries.

But the question now is whether these models can break free from their current confines. Will we witness the day when LLMs independently pioneer entirely new concepts, or are they destined to remain as powerful tools that still require human guidance?

In essence, while these experiments hint at the tantalizing possibility of AI innovation, they also soberly remind us of the challenges ahead. The bill still faces headwinds in committee, as it were. The potential is there, but the path to true innovation will be neither quick nor straightforward.

Can AI Truly Innovate? New Experiments Put LLMs to the Test

Testing the Boundaries of Innovation

Results That Turn Heads

A Critical Role for Generative Verification

Key Terms Explained