OL
Ollama
FeaturedA simple local model runner for pulling, running, and serving popular open-weight models from the command line.
Local Runtimes#Ollama#local LLM runtime#model install#OpenAI compatible
Ollama
Ollama is the easiest default runtime for most local AI users. It installs on macOS, Windows, and Linux, pulls quantized model packages, and exposes a local HTTP API.
Best Fit
- First local model setup.
- Apple Silicon laptops and desktops.
- NVIDIA GPU workstations.
- Local chat, coding, reasoning, vision, and embedding experiments.
- Apps that can connect to
http://localhost:11434.
Install Commands
ollama run qwen3.5:9b
ollama run gemma4:12b
ollama run deepseek-r1:8b
ollama run nomic-embed-text
Hardware Notes
Use small models such as Qwen3.5 4B, Qwen3.5 9B, DeepSeek-R1 8B, or Gemma 4 E4B on laptops. Use 24B to 35B models on systems with 32 GB plus memory or 24 GB plus VRAM.
Watch Outs
The model package size is not the full memory requirement. Long context, image input, and full GPU offload can raise memory use.