Ollama

Ollama is the easiest default runtime for most local AI users. It installs on macOS, Windows, and Linux, pulls quantized model packages, and exposes a local HTTP API.

Best Fit

First local model setup.
Apple Silicon laptops and desktops.
NVIDIA GPU workstations.
Local chat, coding, reasoning, vision, and embedding experiments.
Apps that can connect to http://localhost:11434.

Install Commands

ollama run qwen3.5:9b
ollama run gemma4:12b
ollama run deepseek-r1:8b
ollama run nomic-embed-text

Hardware Notes

Use small models such as Qwen3.5 4B, Qwen3.5 9B, DeepSeek-R1 8B, or Gemma 4 E4B on laptops. Use 24B to 35B models on systems with 32 GB plus memory or 24 GB plus VRAM.

Watch Outs

The model package size is not the full memory requirement. Long context, image input, and full GPU offload can raise memory use.