GM
Gemma 4 Multimodal
FeaturedA local multimodal model family for image input, document understanding, reasoning, coding, and text generation.
Vision Models#Gemma 4#vision model#image input#document AI
Gemma 4 Multimodal
Gemma 4 is a strong default for local multimodal workflows because the family supports compact edge models and larger workstation models.
Install Commands
ollama run gemma4:e4b
ollama run gemma4:12b
ollama run gemma4:26b
Best Fit
- Image question answering.
- Document screenshots.
- Visual reasoning.
- Local assistants that need both text and image input.
Hardware Notes
Image input consumes extra memory. If a text prompt fits but image input crashes or slows down, reduce context length or use a smaller Gemma 4 size.