Google’s Gemini models have gotten seriously good over the past year, but they’re still locked into Google’s ecosystem. If you want to run something on your own hardware without asking for permission, you’ve been stuck with the Gemma line. Gemma 3 launched over a year ago, and in AI time that’s practically ancient. Today, Google is finally shipping Gemma 4, and they’ve made a licensing move that actually addresses what developers have been grumbling about for years.
Four sizes, all designed for local machines. That’s the pitch, and it’s more nuanced than it sounds. The two big ones are a 26B Mixture of Experts model and a 31B Dense model. Google says the 26B MoE can run unquantized in bfloat16 on a single 80GB Nvidia H100 GPU. Sure, that’s a $20,000 card, but it’s still local hardware you can own and control. If you’re willing to quantize down to lower precision, both big models will fit on consumer GPUs. That’s where things get interesting for people who don’t have data center budgets.
The 26B MoE is the speed demon here. It only activates 3.8 billion of its 26 billion parameters during inference. That’s a 7x reduction in active compute, which translates to much higher tokens-per-second than similarly sized dense models. Google says they focused on reducing latency to make local processing actually feel responsive. The 31B Dense variant takes the opposite approach — more quality, less speed. It’s meant for fine-tuning on specific tasks where you want every bit of accuracy you can squeeze out.
But the real news here is the license change. Google has been using a custom Gemma license that annoyed a lot of developers. It had restrictions that made some commercial uses awkward and created uncertainty around redistribution. With Gemma 4, they’re switching to Apache 2.0. That’s the standard open-source license used by PyTorch, TensorFlow, and countless other projects. No weird clauses, no ambiguity. You can use it, modify it, sell products with it, and redistribute it without worrying about Google coming back with some hidden restriction.
I’ve been saying for a while that custom AI licenses are a trap. They look permissive until you read the fine print, and then you discover you can’t do something obvious like run the model on a competing cloud platform. Apache 2.0 solves that. It’s not viral like GPL, it’s not restrictive like some of the early AI model licenses. It’s just clean, well-understood, and battle-tested. Google deserves credit for making this move, even if it took them three generations to get there.
The smaller two variants haven’t been detailed yet, but I’d expect them to target edge devices and mobile. Google’s pattern with previous Gemma releases has been to cover everything from phones to workstations, so Gemma 4 likely follows the same playbook. The developer documentation should clarify the full lineup soon.
One thing I’m keeping an eye on is how Gemma 4 compares to Llama 4 and Mistral’s latest offerings. Meta and Mistral have both been aggressive with open-weight releases, and Google has been playing catch-up in the developer community. The Apache 2.0 license removes one major disadvantage, but model quality and ecosystem support will determine whether developers actually switch. Google has TensorFlow and Keras, but most of the community has moved to PyTorch. That’s a hurdle they still need to address.
For now, Gemma 4 looks like a solid update with a genuinely pro-developer licensing change. The 26B MoE’s efficiency numbers are impressive on paper, and the 31B Dense should appeal to people building specialized applications. If Google keeps iterating at this pace and maintains the Apache 2.0 commitment, they could reclaim some of the goodwill they lost with the early Gemini rollout.
Comments (0)
Login Log in to comment.
Be the first to comment!