Gemma 3 Is Google's Local LLM Comeback

Google has been losing the open-weights vibe war for a year. Gemma 3 is the comeback. Three sizes, native multimodal, and the 27B variant runs at usable speeds on an M4 Mac. Yes really.

The Setup

Gemma 3 ships in 2B (phone), 9B (laptop), and 27B (workstation). All three speak image-in, text-out. The 9B is the sweet spot — fast on a 32GB Mac, great at screenshot triage and document QA.

ollama pull gemma3:9b
ollama run gemma3:9b "what is in this image" --image ./photo.jpg

# or grab the full weights for fine-tuning
huggingface-cli download google/gemma-3-9b-it

The Money Pattern

Image understanding through the transformers pipeline. I'm using this locally for a tagging step on Aidxn portfolio assets — runs on the Mac, no API key, no rate limit.

from transformers import pipeline
from PIL import Image

pipe = pipeline(
    "image-text-to-text",
    model="google/gemma-3-9b-it",
    device_map="auto",
    torch_dtype="auto",
)

img = Image.open("./design-mockup.png")
result = pipe(
    images=img,
    text="List every UI component visible in this screenshot as JSON.",
    max_new_tokens=512,
)
print(result[0]["generated_text"])

The Catch

The Gemma license is not Apache. It's mostly permissive, but there's a Prohibited Use Policy you have to read, and Google reserves the right to update it. For most commercial use it's fine — but read it before you ship a product on top. This is not the license you skim.

The Verdict

For local multimodal on Apple Silicon, Gemma 3 9B is the new pick. Beats Llava, beats MiniCPM, runs faster than both. If you need fully-permissive licensing, look at Llama 4 8B Vision instead. If you just want the best thing your laptop can run tonight, this is it.

Local AI

Three sizes, native vision, and an Apple Silicon glow-up

The Setup

The Money Pattern

The Catch

The Verdict

Let us make some quick suggestions?