Google Ditches Custom Licenses for Gemma 4 AI Models

Google's new Gemma 4 AI models offer four sizes optimized for local use and drop the restrictive custom license. With improved latency and quality, these models could reshape AI development for local machines.
Google's latest AI offering, the Gemma 4 models, marks a significant shift in AI development strategy. Launched recently, these models come in four sizes specifically optimized for local machine usage. The most noteworthy update? Google's decision to abandon its custom Gemma license in favor of a more developer-friendly approach.
Local AI with Powerful Hardware
The two larger Gemma variants, the 26B Mixture of Experts and 31B Dense, have been engineered to run unquantized on an 80GB Nvidia H100 GPU. This isn't just any local hardware. we're talking about a $20,000 AI powerhouse. Yet, for those without deep pockets, there's a silver lining. When quantized for lower precision, these models fit on more accessible consumer GPUs, democratizing advanced AI capabilities.
Performance and Flexibility
Google's focus on reducing latency is evident in the 26B Mixture of Experts model. By activating only 3.8 billion of its 26 billion parameters during inference, this model achieves significantly higher tokens-per-second than its peers. Meanwhile, the 31B Dense model leans more towards quality, offering developers a strong tool for fine-tuning specific applications. The real bottleneck isn't the model. It's the infrastructure.
A Step Forward in AI Licensing
The licensing shift is a breakthrough. By moving away from a custom license, Google is opening the door for more flexible and innovative uses of the Gemma 4 models. It begs the question: Will we see a surge in AI development as a result? The removal of restrictive terms could lead to more widespread adoption and experimentation, eventually driving down inference costs at scale.
Here's what inference actually costs at volume: substantially less when developers have the freedom to optimize and deploy models on their terms. By acknowledging past frustrations and addressing them head-on, Google might set a new standard for AI accessibility.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
The process of taking a pre-trained model and continuing to train it on a smaller, specific dataset to adapt it for a particular task or domain.
Graphics Processing Unit.
Running a trained model to make predictions on new data.
An architecture where multiple specialized sub-networks (experts) share a model, but only a few activate for each input.