GM

Gemma 4 Multimodal

Featured

A local multimodal model family for image input, document understanding, reasoning, coding, and text generation.

Vision Models
Open install page
#Gemma 4#vision model#image input#document AI

Gemma 4 Multimodal

Gemma 4 is a strong default for local multimodal workflows because the family supports compact edge models and larger workstation models.

Install Commands

ollama run gemma4:e4b
ollama run gemma4:12b
ollama run gemma4:26b

Best Fit

  • Image question answering.
  • Document screenshots.
  • Visual reasoning.
  • Local assistants that need both text and image input.

Hardware Notes

Image input consumes extra memory. If a text prompt fits but image input crashes or slows down, reduce context length or use a smaller Gemma 4 size.