The best first local LLM is not the biggest one that barely loads. On 8GB RAM, the useful question is whether the model still feels responsive while your browser, notes, and operating system are open. For most people, that means 0.5B to 3B models first, then careful experiments with 7B later.
Install first
qwen2.5:0.5b
Fastest in the 8GiB CPU-only test at 58.02 tokens/s.
Comfortable small assistant
gemma3:1b
A better quality step while still staying responsive at 30.70 tokens/s.
Upper practical range
qwen2.5:3b
Usable at 16.77 tokens/s, but it no longer feels instant.
AI Jupyter also ran six Ollama models in a Docker container capped at 8GiB RAM with no GPU passthrough. The test includes raw Docker proof, Ollama CPU logs, model list screenshots, and test JSON.
Read the 8GB test