Super HN

New Show
   Introducing Gemma 3n (developers.googleblog.com)
I still don't understand the difference between Gemma and Gemini for on-device, since both don't need network access. From https://developer.android.com/ai/gemini-nano :

"Gemini Nano allows you to deliver rich generative AI experiences without needing a network connection or sending data to the cloud." -- replace Gemini with Gemma and the sentence still valid.

I suspect the difference is in the training data. Gemini is much more locked down and if it tries to repeat something from the draining data verbatim you will get a 'recitation error'.
Licensing. You can't use Gemini Nano weights directly (at least commercial ly) and must interact with them through Android MLKit or similar Google approved runtimes.

You can use Gemma commercially using whatever runtime or framework you can get to run it.

Gemma is open source and apache 2.0 licensed. If you want to include it with an app you have to package it yourself.

gemini nano is an android api that you dont control at all.

Perplexity.ai gave an easier to understand response than Gemini 2.5 afaict.

Gemini nano is for Android only.

Gemma is available for other platforms and has multiple size options.

So it seems like Gemini nano might be a very focused Gemma everywhere to follow the biology metaphor instead of the Italian name interpretation

The fact that you need HN and competitors to explain your offering should make Google reflect …
Made some GGUFs if anyone wants to run them!

./llama.cpp/llama-cli -hf unsloth/gemma-3n-E4B-it-GGUF:UD-Q4_K_XL -ngl 99 --jinja --temp 0.0

./llama.cpp/llama-cli -hf unsloth/gemma-3n-E2B-it-GGUF:UD-Q4_K_XL -ngl 99 --jinja --temp 0.0

I'm also working on an inference + finetuning Colab demo! I'm very impressed since Gemma 3N has audio, text and vision! https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-...

Literally was typing out "Unsloth, do your thing!!" but you are way ahead of me. You rock <3 <3 <3

Thank you!

LM Studio has MLX variants of the model out: http://huggingface.co/lmstudio-community/gemma-3n-E4B-it-MLX...

However it's still 8B parameters and there are no quantized models just yet.

This looks amazing given the parameter sizes and capabilities (audio, visual, text). I like the idea of keeping simple tasks local. I’ll be curious to see if this can be run on an M1 machine…
Sure it can, easiest way is to get ollama, then `ollama run gemma3n` You can pair it with tools like simonw's LLM to pipe stuff to it.
This should run fine on most hardware - CPU inference of the E2B model on my Pixel 8 Pro gives me ~9tok/second of decode speed.
[flagged]