Hardware-specific local LLM guide

Best Local LLMs for 8GB RAM in 2026: Ollama, LM Studio, and Model Picks

A practical 8GB RAM local LLM guide for Ollama and LM Studio users, with model picks, setup limits, 7B tradeoffs, and CPU-only test evidence.

Best first download

Gemma 3 4B

Model rows

76

local model rows

Updated

Jul 3, 2026

metrics snapshot

Families

15

model families

Choose a quick starting point

Use one common setup, then adjust exact RAM, GPU memory, and workload below.

Your current answer

Try Gemma 3 4B first

8 GB RAM / no dedicated GPU gives about 4 GB usable model memory. This pick fits now.

Local recommendation uses this configuration.

Models to test

76

Fits now

17

Fits or stretch

18

Popularity metrics refreshed Jul 3, 2026

Recommendation source: AI Jupyter local recommendation data

Gemma logo
Fits

Gemma 3 4B

GoogleGemma terms

Small multimodal assistant

A compact option when image input matters but hardware is limited.

Parameters

4B

Q4 size

3 GB

RAM floor

8 GB

VRAM target

CPU / unified

Performance

52/100

Pulls

38.2M

chatvisionWorkload match

Fit order

Performance + adoption + fit

#1

Match score

67/100

Adoption

95/100

Install hint

ollama run gemma3:4b
Google Gemma docs
Qwen logo
Fits

Qwen3 4B

AlibabaApache 2.0

Balanced small-device assistant

Good first model for 8 GB to 16 GB machines when speed matters.

Parameters

4B

Q4 size

2.8 GB

RAM floor

8 GB

VRAM target

CPU / unified

Performance

51/100

Pulls

31.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#2

Match score

66/100

Adoption

94/100

Install hint

ollama run qwen3:4b
Qwen3 official release
Meta logo
Fits

Llama 3.2 3B

MetaLlama license

Small general assistant

A lightweight Meta open-weight option for simple local use.

Parameters

3B

Q4 size

2 GB

RAM floor

8 GB

VRAM target

CPU / unified

Performance

45/100

Pulls

75M

chatcodingWorkload match

Fit order

Performance + adoption + fit

#3

Match score

64/100

Adoption

99/100

Install hint

ollama run llama3.2:3b
Meta Llama release notes
Microsoft logo
Fits

Phi-4 Mini Instruct

MicrosoftMIT

Small English reasoning

A compact model for local assistants, classification, and structured reasoning prompts.

Parameters

3.8B

Q4 size

2.6 GB

RAM floor

8 GB

VRAM target

CPU / unified

Performance

50/100

Pulls

1.3M

chatreasoningWorkload match

Fit order

Performance + adoption + fit

#4

Match score

62/100

Adoption

77/100

Install hint

ollama run phi4-mini
Microsoft Phi product page
Qwen logo
Fits

Qwen3 1.7B

AlibabaApache 2.0

Low-memory local assistant

A better floor than sub-1B models when you need basic coding help on small devices.

Parameters

1.7B

Q4 size

1.3 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

39/100

Pulls

31.7M

chatcodingWorkload match

Fit order

Performance + adoption + fit

#5

Match score

59/100

Adoption

94/100

Install hint

ollama run qwen3:1.7b
Qwen3 official release
Meta logo
Fits

Llama 3.2 1B

MetaLlama license

Tiny local chat

Runs almost everywhere and is useful for local automation tests.

Parameters

1B

Q4 size

0.9 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

35/100

Pulls

75M

chatWorkload match

Fit order

Performance + adoption + fit

#6

Match score

58/100

Adoption

99/100

Install hint

ollama run llama3.2:1b
Meta Llama release notes
IBM Granite logo
Fits

Granite 4.0 Tiny

IBMApache 2.0

Next-generation Granite evaluation

Track this for Apache-licensed business deployment tests.

Parameters

Tiny MoE

Q4 size

4 GB

RAM floor

8 GB

VRAM target

CPU / unified

Performance

49/100

Downloads

144.2K

chatcodingWorkload match

Fit order

Performance + adoption + fit

#7

Match score

58/100

Adoption

65/100

Install hint

huggingface-cli download ibm-granite/granite-4.0-tiny-preview
IBM Granite
Gemma logo
Fits

Gemma 3 1B

GoogleGemma terms

Tiny Google open model tests

Useful for mobile-adjacent experiments and low-memory chat.

Parameters

1B

Q4 size

0.8 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

34/100

Pulls

38.2M

chatWorkload match

Fit order

Performance + adoption + fit

#8

Match score

57/100

Adoption

95/100

Install hint

ollama run gemma3:1b
Google Gemma docs
IBM Granite logo
Fits

Granite 3.3 2B Instruct

IBMApache 2.0

Small enterprise-friendly assistant

Apache-licensed model for teams that care about permissive licensing.

Parameters

2B

Q4 size

1.5 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

41/100

Pulls

1M

chatcodingWorkload match

Fit order

Performance + adoption + fit

#9

Match score

56/100

Adoption

75/100

Install hint

ollama run granite3.3:2b
IBM Granite
Qwen logo
Fits

DeepSeek-R1 Distill Qwen 1.5B

DeepSeekMIT

Tiny reasoning experiments

Good for testing reasoning prompts locally before moving to larger R1 distills.

Parameters

1.5B

Q4 size

1.1 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

29/100

Pulls

89M

reasoningchatWorkload match

Fit order

Performance + adoption + fit

#10

Match score

55/100

Adoption

100/100

Install hint

ollama run deepseek-r1:1.5b
DeepSeek R1 on Hugging Face
Qwen logo
Fits

Qwen3 0.6B

AlibabaApache 2.0

Tiny local chat and quick smoke tests

Runs on almost any laptop, but keep expectations modest for coding and reasoning.

Parameters

0.6B

Q4 size

0.6 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

31/100

Pulls

31.7M

chatWorkload match

Fit order

Performance + adoption + fit

#11

Match score

55/100

Adoption

94/100

Install hint

ollama run qwen3:0.6b
Qwen3 official release
Hunyuan logo
Fits

Tencent Hunyuan 1.8B Instruct

TencentTencent Hunyuan license

Tiny Hunyuan local tests

Useful when you want a very small Chinese-English local model.

Parameters

1.8B

Q4 size

1.4 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

28/100

Downloads

406

chatWorkload match

Fit order

Performance + adoption + fit

#12

Match score

38/100

Adoption

33/100

Install hint

huggingface-cli download tencent/Hunyuan-1.8B-Instruct
Tencent Hunyuan Hugging Face
Nomic logo
Fits

nomic-embed-text

NomicApache 2.0

Local semantic search

Use for private RAG indexes, document search, and lightweight retrieval.

Parameters

137M

Q4 size

0.3 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

0/100

Pulls

76.9M

embedding

Fit order

Performance + adoption + fit

#13

Match score

34/100

Adoption

99/100

Install hint

ollama pull nomic-embed-text
Ollama embedding models
Mixedbread logo
Fits

mxbai-embed-large

MixedbreadApache 2.0

Higher-quality local embeddings

Good default when embedding quality matters more than model size.

Parameters

335M

Q4 size

0.7 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

0/100

Pulls

12.1M

embedding

Fit order

Performance + adoption + fit

#14

Match score

31/100

Adoption

89/100

Install hint

ollama pull mxbai-embed-large
Ollama embedding models
BAAI logo
Fits

BGE-M3

BAAIMIT

Multilingual RAG retrieval

A strong embedding pick for multilingual search and hybrid retrieval workflows.

Parameters

568M

Q4 size

1.2 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

0/100

Pulls

5M

embedding

Fit order

Performance + adoption + fit

#15

Match score

30/100

Adoption

84/100

Install hint

ollama pull bge-m3
Ollama embedding models
Jina AI logo
Fits

Jina Embeddings v3

Jina AICC BY-NC 4.0

Multilingual document embeddings

Check license terms before commercial usage; useful for local multilingual evaluation.

Parameters

572M

Q4 size

1.2 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

0/100

Downloads

2.8M

embedding

Fit order

Performance + adoption + fit

#16

Match score

29/100

Adoption

81/100

Install hint

huggingface-cli download jinaai/jina-embeddings-v3
Ollama embedding models
Snowflake logo
Fits

Snowflake Arctic Embed L

SnowflakeApache 2.0

Enterprise-style local retrieval

A permissive embedding model for private search pipelines.

Parameters

334M

Q4 size

0.8 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

0/100

Downloads

32.6K

embedding

Fit order

Performance + adoption + fit

#17

Match score

24/100

Adoption

57/100

Install hint

huggingface-cli download Snowflake/snowflake-arctic-embed-l
Ollama embedding models
Mistral logo
Stretch

Mistral 7B Instruct

Mistral AIApache 2.0

Classic lightweight local assistant

A stable baseline model for general local chat and tool experiments.

Parameters

7B

Q4 size

4.4 GB

RAM floor

8 GB

VRAM target

6 GB

Performance

59/100

Pulls

30.7M

chatcodingWorkload match

Fit order

Performance + adoption + fit

#18

Match score

68/100

Adoption

94/100

Install hint

ollama run mistral:7b
Mistral Small release notes
Meta logo
Upgrade

Llama 3.3 70B

MetaLlama license

Large general open-weight assistant

A common baseline for strong local text performance on large rigs.

Parameters

70B

Q4 size

43 GB

RAM floor

128 GB

VRAM target

48 GB

Performance

82/100

Pulls

4M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#19

Match score

75/100

Adoption

83/100

Install hint

ollama run llama3.3:70b
Meta Llama release notes
Mistral logo
Upgrade

Mixtral 8x7B

Mistral AIApache 2.0

MoE general assistant

Still useful for local MoE experiments and multilingual workloads.

Parameters

46.7B MoE

Q4 size

26 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

82/100

Pulls

2.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#20

Match score

74/100

Adoption

81/100

Install hint

ollama run mixtral:8x7b
Mistral Small release notes
Gemma logo
Upgrade

Gemma 3 27B

GoogleGemma terms

High-quality local multimodal work

Best reserved for 24 GB GPUs, high-memory Macs, or larger systems.

Parameters

27B

Q4 size

17 GB

RAM floor

48 GB

VRAM target

24 GB

Performance

74/100

Pulls

38.2M

chatvisionreasoningWorkload match

Fit order

Performance + adoption + fit

#21

Match score

73/100

Adoption

95/100

Install hint

ollama run gemma3:27b
Google Gemma docs
Qwen logo
Upgrade

Qwen3 32B

AlibabaApache 2.0

Workstation-grade open model

A serious local upgrade for coding, agent workflows, and difficult reasoning tasks.

Parameters

32B

Q4 size

20 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

74/100

Pulls

31.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#22

Match score

73/100

Adoption

94/100

Install hint

ollama run qwen3:32b
Qwen3 official release
Qwen logo
Upgrade

Qwen3 30B-A3B

AlibabaApache 2.0

Efficient MoE reasoning

Mixture-of-experts design can offer strong quality without activating every parameter.

Parameters

30B MoE

Q4 size

18 GB

RAM floor

48 GB

VRAM target

16 GB

Performance

74/100

Pulls

31.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#23

Match score

73/100

Adoption

94/100

Install hint

ollama run qwen3:30b-a3b
Qwen3 official release
Mistral logo
Upgrade

Mistral Small 3.2 24B

Mistral AIApache 2.0

Updated 24B multimodal assistant

A newer Small-family option for instruction following, vision, and tool use.

Parameters

24B

Q4 size

15 GB

RAM floor

32 GB

VRAM target

24 GB

Performance

82/100

Downloads

653.1K

chatcodingvisionWorkload match

Fit order

Performance + adoption + fit

#24

Match score

73/100

Adoption

73/100

Install hint

huggingface-cli download mistralai/Mistral-Small-3.2-24B-Instruct-2506
LM Studio model catalog
Gemma logo
Upgrade

Gemma 4 31B

GoogleGemma terms

Frontier local workstation model

A larger Gemma option for capable desktops and local multimodal workflows.

Parameters

31B

Q4 size

20 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

74/100

Pulls

16.6M

reasoningcodingvisionchatWorkload match

Fit order

Performance + adoption + fit

#25

Match score

72/100

Adoption

91/100

Install hint

ollama run gemma4:31b
LM Studio model catalog
Gemma logo
Upgrade

Gemma 4 26B-A4B

GoogleGemma terms

Efficient larger multimodal model

Useful when you want a larger Gemma family model without activating every parameter.

Parameters

26B MoE

Q4 size

16.5 GB

RAM floor

48 GB

VRAM target

24 GB

Performance

74/100

Pulls

16.6M

chatreasoningvisionWorkload match

Fit order

Performance + adoption + fit

#26

Match score

72/100

Adoption

91/100

Install hint

ollama run gemma4:26b-a4b
LM Studio model catalog
Gemma logo
Upgrade

Gemma 4 E4B

GoogleGemma terms

Edge-friendly multimodal assistant

Choose this when image input matters and you still want a compact model.

Parameters

E4B

Q4 size

9.6 GB

RAM floor

16 GB

VRAM target

10 GB

Performance

74/100

Pulls

16.6M

chatreasoningvisioncodingWorkload match

Fit order

Performance + adoption + fit

#27

Match score

72/100

Adoption

91/100

Install hint

ollama run gemma4:e4b
LM Studio model catalog
Mistral logo
Upgrade

Mistral Small 3.1 24B

Mistral AIApache 2.0

High-quality local assistant

Strong when you need low-latency function calling and multimodal input.

Parameters

24B

Q4 size

15 GB

RAM floor

32 GB

VRAM target

24 GB

Performance

82/100

Downloads

159.6K

chatcodingvisionWorkload match

Fit order

Performance + adoption + fit

#28

Match score

71/100

Adoption

65/100

Install hint

huggingface-cli download mistralai/Mistral-Small-3.1-24B-Instruct-2503
Mistral Small release notes
Qwen logo
Upgrade

Qwen3 235B-A22B

AlibabaApache 2.0

Server-class open-weight deployment

Track this for cluster or multi-GPU servers rather than normal desktops.

Parameters

235B MoE

Q4 size

150 GB

RAM floor

256 GB

VRAM target

80 GB

Performance

74/100

Downloads

848.7K

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#29

Match score

68/100

Adoption

75/100

Install hint

huggingface-cli download Qwen/Qwen3-235B-A22B
Qwen3 official release
Meta logo
Upgrade

Llama 4 Scout

MetaLlama license

Long-context multimodal servers

Open-weight Llama 4 model for server-side local deployment planning.

Parameters

17B active MoE

Q4 size

90 GB

RAM floor

192 GB

VRAM target

80 GB

Performance

74/100

Downloads

747.3K

chatvisionreasoningWorkload match

Fit order

Performance + adoption + fit

#30

Match score

68/100

Adoption

74/100

Install hint

huggingface-cli download meta-llama/Llama-4-Scout-17B-16E-Instruct
Meta Llama release notes
Qwen logo
Upgrade

Qwen3 14B

AlibabaApache 2.0

Higher-quality local reasoning

Useful when 8B is not consistent enough and you still want practical local speed.

Parameters

14B

Q4 size

9 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

65/100

Pulls

31.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#31

Match score

67/100

Adoption

94/100

Install hint

ollama run qwen3:14b
Qwen3 official release
Mistral logo
Upgrade

Mistral NeMo 12B

Mistral AIApache 2.0

Efficient multilingual local chat

A good mid-size Mistral option for broad language coverage.

Parameters

12B

Q4 size

7.5 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

69/100

Pulls

5.2M

chatcodingWorkload match

Fit order

Performance + adoption + fit

#32

Match score

67/100

Adoption

84/100

Install hint

ollama run mistral-nemo:12b
Mistral Small release notes
Gemma logo
Upgrade

Gemma 3 12B

GoogleGemma terms

Balanced multimodal local model

Good on Apple Silicon with enough unified memory or a 12 GB GPU.

Parameters

12B

Q4 size

8.2 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

63/100

Pulls

38.2M

chatvisionreasoningWorkload match

Fit order

Performance + adoption + fit

#33

Match score

66/100

Adoption

95/100

Install hint

ollama run gemma3:12b
Google Gemma docs
Qwen logo
Upgrade

Qwen2.5-Coder 14B

AlibabaApache 2.0

Local repository edits

A useful middle tier when 7B misses framework details but 32B is too heavy.

Parameters

14B

Q4 size

9 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

65/100

Pulls

18.1M

codingchatWorkload match

Fit order

Performance + adoption + fit

#34

Match score

66/100

Adoption

91/100

Install hint

ollama run qwen2.5-coder:14b
Ollama model library
Qwen logo
Upgrade

Qwen3 8B

AlibabaApache 2.0

Default open local assistant

Strong everyday pick for multilingual chat, coding, and reasoning on consumer hardware.

Parameters

8B

Q4 size

5.2 GB

RAM floor

16 GB

VRAM target

6 GB

Performance

62/100

Pulls

31.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#35

Match score

65/100

Adoption

94/100

Install hint

ollama run qwen3:8b
Qwen3 official release
Gemma logo
Upgrade

Gemma 4 E2B

GoogleGemma terms

Efficient edge multimodal use

Track this for compact visual assistants and quick local prototypes.

Parameters

E2B

Q4 size

5.5 GB

RAM floor

12 GB

VRAM target

6 GB

Performance

63/100

Pulls

16.6M

chatvisionWorkload match

Fit order

Performance + adoption + fit

#36

Match score

65/100

Adoption

91/100

Install hint

ollama run gemma4:e2b
LM Studio model catalog
Microsoft logo
Upgrade

Phi-4 14B

MicrosoftMIT

Compact English reasoning

A practical option when you want stronger reasoning than tiny models.

Parameters

14B

Q4 size

9.1 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

65/100

Pulls

7.6M

reasoningchatcodingWorkload match

Fit order

Performance + adoption + fit

#37

Match score

65/100

Adoption

87/100

Install hint

ollama run phi4:14b
Microsoft Phi product page
Kimi logo
Upgrade

Kimi K2 Instruct

Moonshot AIModified MIT

Large open agentic model

Track this for hosted local infrastructure, coding agents, and long-context analysis.

Parameters

1T MoE

Q4 size

600 GB

RAM floor

768 GB

VRAM target

240 GB

Performance

70/100

Downloads

420.3K

codingreasoningchatWorkload match

Fit order

Performance + adoption + fit

#38

Match score

65/100

Adoption

71/100

Install hint

huggingface-cli download moonshotai/Kimi-K2-Instruct
Moonshot Kimi K2 repository
Gemma logo
Upgrade

Gemma 4 12B

GoogleGemma terms

Fast workstation multimodal model

Good on Apple Silicon with enough unified memory or 12 GB plus GPUs.

Parameters

12B

Q4 size

7.6 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

62/100

Pulls

16.6M

chatreasoningvisioncodingWorkload match

Fit order

Performance + adoption + fit

#39

Match score

64/100

Adoption

91/100

Install hint

ollama run gemma4:12b
LM Studio model catalog
Meta logo
Upgrade

Llama 4 Maverick

MetaLlama license

Very large local AI infrastructure

Treat this as a data-center candidate, not a normal desktop install.

Parameters

17B active MoE

Q4 size

250 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

74/100

Downloads

45.5K

chatvisionreasoningWorkload match

Fit order

Performance + adoption + fit

#40

Match score

64/100

Adoption

59/100

Install hint

huggingface-cli download meta-llama/Llama-4-Maverick-17B-128E-Instruct
Meta Llama release notes
Qwen logo
Upgrade

Qwen2.5-Coder 7B

AlibabaApache 2.0

Small coding assistant

A practical code-focused model for local autocomplete, review, and refactor prompts.

Parameters

7B

Q4 size

4.7 GB

RAM floor

16 GB

VRAM target

6 GB

Performance

60/100

Pulls

18.1M

codingchatWorkload match

Fit order

Performance + adoption + fit

#41

Match score

63/100

Adoption

91/100

Install hint

ollama run qwen2.5-coder:7b
Ollama model library
MiniMax logo
Upgrade

MiniMax M2

MiniMaxModified MIT

Open-weight server assistant

Evaluate license and deployment constraints before production use.

Parameters

MoE

Q4 size

280 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

70/100

Downloads

114.7K

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#42

Match score

63/100

Adoption

64/100

Install hint

huggingface-cli download MiniMaxAI/MiniMax-M2
MiniMax M2 repository
LLaVA logo
Upgrade

LLaVA 13B

LLaVAApache 2.0

Stronger local image chat

Use this when the 7B model misses details and you have enough memory.

Parameters

13B

Q4 size

8 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

59/100

Pulls

14.3M

visionchatWorkload match

Fit order

Performance + adoption + fit

#43

Match score

62/100

Adoption

90/100

Install hint

ollama run llava:13b
Ollama LLaVA page
Meta logo
Upgrade

Llama 3.2 Vision 11B

MetaLlama license

Local image understanding

A practical option for screenshots, visual Q&A, and document images.

Parameters

11B

Q4 size

7.5 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

61/100

Pulls

4.8M

visionchatWorkload match

Fit order

Performance + adoption + fit

#44

Match score

62/100

Adoption

84/100

Install hint

ollama run llama3.2-vision:11b
Meta Llama release notes
Z.ai logo
Upgrade

GLM-4.7 Flash

Z.aiMIT

Efficient GLM deployment

A lighter GLM-family option for local servers and larger workstations.

Parameters

30B-A3B MoE

Q4 size

18 GB

RAM floor

48 GB

VRAM target

24 GB

Performance

62/100

Downloads

2.5M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#45

Match score

62/100

Adoption

80/100

Install hint

huggingface-cli download zai-org/GLM-4.7-Flash
Z.ai GLM-4.5 repository
DeepSeek logo
Upgrade

DeepSeek-R1 Distill Llama 8B

DeepSeekMIT

Reasoning on common GPUs

A popular local R1 size for RTX 3060-class hardware.

Parameters

8B

Q4 size

5.2 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

54/100

Pulls

89M

reasoningcodingchatWorkload match

Fit order

Performance + adoption + fit

#46

Match score

61/100

Adoption

100/100

Install hint

ollama run deepseek-r1:8b
DeepSeek R1 on Hugging Face
MiniMax logo
Upgrade

MiniMax M2.7

MiniMaxModified MIT

Newest MiniMax open-weight candidate

Use as a research and infrastructure planning entry until local runtimes mature.

Parameters

MoE

Q4 size

280 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

62/100

Downloads

1.5M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#47

Match score

61/100

Adoption

78/100

Install hint

huggingface-cli download MiniMaxAI/MiniMax-M2.7
MiniMax M2 repository
Qwen logo
Upgrade

DeepSeek-R1 Distill Qwen 7B

DeepSeekMIT

Small local reasoning

Use for math, logic, and step-by-step problem solving on consumer hardware.

Parameters

7B

Q4 size

4.7 GB

RAM floor

16 GB

VRAM target

6 GB

Performance

52/100

Pulls

89M

reasoningcodingchatWorkload match

Fit order

Performance + adoption + fit

#48

Match score

60/100

Adoption

100/100

Install hint

ollama run deepseek-r1:7b
DeepSeek R1 on Hugging Face
IBM Granite logo
Upgrade

Granite 3.3 8B Instruct

IBMApache 2.0

Permissive local business model

Good candidate for private enterprise assistants and RAG evaluation.

Parameters

8B

Q4 size

5 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

61/100

Pulls

1M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#49

Match score

60/100

Adoption

75/100

Install hint

ollama run granite3.3:8b
IBM Granite
MiniMax logo
Upgrade

MiniMax M2.5

MiniMaxModified MIT

Updated MiniMax local server track

A large open-weight candidate for organizations with inference clusters.

Parameters

MoE

Q4 size

280 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

62/100

Downloads

674.7K

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#50

Match score

60/100

Adoption

73/100

Install hint

huggingface-cli download MiniMaxAI/MiniMax-M2.5
MiniMax M2 repository
Z.ai logo
Upgrade

GLM-4.5 Air

Z.aiMIT

Large open agent model

A high-capability open model for servers with strong memory budgets.

Parameters

106B MoE

Q4 size

65 GB

RAM floor

128 GB

VRAM target

80 GB

Performance

62/100

Downloads

389.6K

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#51

Match score

60/100

Adoption

70/100

Install hint

huggingface-cli download zai-org/GLM-4.5-Air
Z.ai GLM-4.5 repository
Qwen logo
Upgrade

Qwen2.5-VL 7B

AlibabaApache 2.0

Local visual question answering

Use when screenshots, charts, receipts, or documents matter more than pure text speed.

Parameters

7B

Q4 size

5.5 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

55/100

Downloads

9.9M

visionchatWorkload match

Fit order

Performance + adoption + fit

#52

Match score

59/100

Adoption

88/100

Install hint

huggingface-cli download Qwen/Qwen2.5-VL-7B-Instruct
Qwen3 official release
Z.ai logo
Upgrade

GLM-4.5

Z.aiMIT

Cluster-class open agent model

Useful for tracking frontier open-weight systems, but not ordinary local desktops.

Parameters

355B MoE

Q4 size

220 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

62/100

Downloads

161.1K

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#53

Match score

59/100

Adoption

66/100

Install hint

huggingface-cli download zai-org/GLM-4.5
Z.ai GLM-4.5 repository
IBM Granite logo
Upgrade

Granite 4.0 Small

IBMApache 2.0

Business-grade open model testing

A larger Granite candidate for teams that want an Apache-licensed model as a default.

Parameters

Small MoE

Q4 size

12 GB

RAM floor

32 GB

VRAM target

16 GB

Performance

71/100

Downloads

n/a

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#54

Match score

57/100

Adoption

35/100

Install hint

huggingface-cli download ibm-granite/granite-4.0-small-preview
IBM Granite
Mistral logo
Upgrade

Pixtral 12B

Mistral AIApache 2.0

Mistral local vision assistant

Good for open multimodal experiments with the Mistral ecosystem.

Parameters

12B

Q4 size

7.8 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

70/100

Pulls

n/a

visionchatWorkload match

Fit order

Performance + adoption + fit

#55

Match score

56/100

Adoption

35/100

Install hint

ollama run pixtral:12b
LM Studio model catalog
LLaVA logo
Upgrade

LLaVA 7B

LLaVAApache 2.0

Classic local image chat

A widely supported visual assistant for screenshots and simple image questions.

Parameters

7B

Q4 size

4.7 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

48/100

Pulls

14.3M

visionchatWorkload match

Fit order

Performance + adoption + fit

#56

Match score

55/100

Adoption

90/100

Install hint

ollama run llava:7b
Ollama LLaVA page
Microsoft logo
Upgrade

Phi-4 Multimodal

MicrosoftMIT

Small multimodal Microsoft model

Supports text, vision, and audio-style multimodal workflows depending on runtime support.

Parameters

5.6B

Q4 size

4.3 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

50/100

Downloads

532.8K

visionchatWorkload match

Fit order

Performance + adoption + fit

#57

Match score

53/100

Adoption

72/100

Install hint

huggingface-cli download microsoft/Phi-4-multimodal-instruct
Microsoft Phi product page
Ai2 logo
Upgrade

OLMo 2 13B Instruct

Ai2Apache 2.0

Transparent open-model evaluation

A good comparison point for fully open training pipelines.

Parameters

13B

Q4 size

8.5 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

60/100

Downloads

8.5K

chatreasoningWorkload match

Fit order

Performance + adoption + fit

#58

Match score

53/100

Adoption

49/100

Install hint

huggingface-cli download allenai/OLMo-2-1124-13B-Instruct
Ai2 OLMo
Ai2 logo
Upgrade

OLMo 3 32B

Ai2Apache 2.0

Fully open large-model research

A larger transparent model family entry for reproducible AI research.

Parameters

32B

Q4 size

20 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

62/100

Downloads

n/a

chatreasoningWorkload match

Fit order

Performance + adoption + fit

#59

Match score

51/100

Adoption

35/100

Install hint

huggingface-cli download allenai/OLMo-3-32B
Ai2 OLMo
Hunyuan logo
Upgrade

Tencent Hunyuan Large

TencentTencent Hunyuan license

Large local infrastructure reference

Track for server deployments and China-market open model coverage.

Parameters

389B MoE

Q4 size

240 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

62/100

Downloads

233

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#60

Match score

50/100

Adoption

30/100

Install hint

huggingface-cli download tencent/Tencent-Hunyuan-Large
Tencent Hunyuan Hugging Face
Ai2 logo
Upgrade

OLMo 2 7B Instruct

Ai2Apache 2.0

Fully open research baseline

Choose OLMo when training data and model transparency matter.

Parameters

7B

Q4 size

4.8 GB

RAM floor

16 GB

VRAM target

6 GB

Performance

49/100

Downloads

49.5K

chatWorkload match

Fit order

Performance + adoption + fit

#61

Match score

49/100

Adoption

59/100

Install hint

huggingface-cli download allenai/OLMo-2-1124-7B-Instruct
Ai2 OLMo
Ai2 logo
Upgrade

Molmo 7B-D

Ai2Apache 2.0

Open visual reasoning research

A transparent vision-language model family for local multimodal tests.

Parameters

7B

Q4 size

5.5 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

43/100

Downloads

26.8K

visionchatWorkload match

Fit order

Performance + adoption + fit

#62

Match score

45/100

Adoption

56/100

Install hint

huggingface-cli download allenai/Molmo-7B-D-0924
Ai2 OLMo
DeepSeek logo
Upgrade

DeepSeek-R1 Distill Llama 70B

DeepSeekMIT

Large local reasoning servers

Better suited to multi-GPU rigs or high-memory Apple Silicon systems.

Parameters

70B

Q4 size

43 GB

RAM floor

128 GB

VRAM target

48 GB

Performance

50/100

Pulls

89M

reasoningcoding

Fit order

Performance + adoption + fit

#63

Match score

56/100

Adoption

100/100

Install hint

ollama run deepseek-r1:70b
DeepSeek R1 on Hugging Face
Qwen logo
Upgrade

DeepSeek-R1 Distill Qwen 32B

DeepSeekMIT

Serious local reasoning

Use when answer quality matters more than speed and you have workstation memory.

Parameters

32B

Q4 size

20 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

50/100

Pulls

89M

reasoningcoding

Fit order

Performance + adoption + fit

#64

Match score

56/100

Adoption

100/100

Install hint

ollama run deepseek-r1:32b
DeepSeek R1 on Hugging Face
Qwen logo
Upgrade

Qwen2.5-Coder 32B

AlibabaApache 2.0

High-quality local coding

One of the strongest widely used open coding models for local workstations.

Parameters

32B

Q4 size

20 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

50/100

Pulls

18.1M

codingreasoning

Fit order

Performance + adoption + fit

#65

Match score

54/100

Adoption

91/100

Install hint

ollama run qwen2.5-coder:32b
Ollama model library
Meta logo
Upgrade

Llama 3.2 Vision 90B

MetaLlama license

Large local vision servers

Use this only on serious multi-GPU or high-memory systems.

Parameters

90B

Q4 size

56 GB

RAM floor

128 GB

VRAM target

80 GB

Performance

50/100

Pulls

4.8M

visionreasoning

Fit order

Performance + adoption + fit

#66

Match score

52/100

Adoption

84/100

Install hint

ollama run llama3.2-vision:90b
Meta Llama release notes
LLaVA logo
Upgrade

LLaVA 34B

LLaVAApache 2.0

Large local visual assistant

A workstation-class LLaVA model for more demanding visual tasks.

Parameters

34B

Q4 size

21 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

46/100

Pulls

14.3M

visionreasoning

Fit order

Performance + adoption + fit

#67

Match score

51/100

Adoption

90/100

Install hint

ollama run llava:34b
Ollama LLaVA page
Qwen logo
Upgrade

DeepSeek-R1 Distill Qwen 14B

DeepSeekMIT

Better local math and logic

A sensible upgrade when 7B and 8B distills are too brittle.

Parameters

14B

Q4 size

9 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

41/100

Pulls

89M

reasoningcoding

Fit order

Performance + adoption + fit

#68

Match score

50/100

Adoption

100/100

Install hint

ollama run deepseek-r1:14b
DeepSeek R1 on Hugging Face
Mistral logo
Upgrade

Codestral 22B

Mistral AIMistral research license

Code completion and generation

Use for code-specific workflows; check license terms before commercial deployment.

Parameters

22B

Q4 size

13 GB

RAM floor

32 GB

VRAM target

16 GB

Performance

49/100

Pulls

1.3M

coding

Fit order

Performance + adoption + fit

#69

Match score

50/100

Adoption

77/100

Install hint

ollama run codestral:22b
Mistral Small release notes
Qwen logo
Upgrade

Qwen2.5-VL 32B

AlibabaApache 2.0

Large local multimodal analysis

A workstation-class option for documents, diagrams, UI review, and visual reasoning.

Parameters

32B

Q4 size

22 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

50/100

Downloads

445.8K

visionreasoning

Fit order

Performance + adoption + fit

#70

Match score

49/100

Adoption

71/100

Install hint

huggingface-cli download Qwen/Qwen2.5-VL-32B-Instruct
Qwen3 official release
DeepSeek logo
Upgrade

DeepSeek-R1

DeepSeekMIT

Cluster-scale open reasoning

Reference the full model for server planning, not single-desktop installation.

Parameters

671B MoE

Q4 size

404 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

38/100

Downloads

7.8M

reasoningcoding

Fit order

Performance + adoption + fit

#71

Match score

45/100

Adoption

87/100

Install hint

huggingface-cli download deepseek-ai/DeepSeek-R1
DeepSeek R1 on Hugging Face
Kimi logo
Upgrade

Kimi K2.6

Moonshot AIModified MIT

Native multimodal agentic model

A server-scale candidate for teams tracking open agent models.

Parameters

1T MoE

Q4 size

600 GB

RAM floor

768 GB

VRAM target

240 GB

Performance

38/100

Downloads

2.3M

codingreasoningvision

Fit order

Performance + adoption + fit

#72

Match score

44/100

Adoption

80/100

Install hint

huggingface-cli download moonshotai/Kimi-K2.6
Moonshot Kimi K2 repository
Mistral logo
Upgrade

Devstral Small 24B

Mistral AIApache 2.0

Agentic coding tasks

A code-agent oriented open model for repository-level work.

Parameters

24B

Q4 size

15 GB

RAM floor

32 GB

VRAM target

24 GB

Performance

50/100

Downloads

6.2K

coding

Fit order

Performance + adoption + fit

#73

Match score

44/100

Adoption

48/100

Install hint

huggingface-cli download mistralai/Devstral-Small-2505
LM Studio model catalog
Mistral logo
Upgrade

E5-Mistral 7B Instruct

MicrosoftMIT

Large instruction embeddings

Use when retrieval quality matters more than embedding throughput.

Parameters

7B

Q4 size

4.8 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

37/100

Downloads

407.5K

embedding

Fit order

Performance + adoption + fit

#74

Match score

41/100

Adoption

71/100

Install hint

huggingface-cli download intfloat/e5-mistral-7b-instruct
Ollama embedding models
Microsoft logo
Upgrade

Phi-4 Reasoning

MicrosoftMIT

Reasoning-focused Phi workflows

Pick this over base Phi-4 when step-by-step problem solving is the main job.

Parameters

14B

Q4 size

9.3 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

42/100

Downloads

23.7K

reasoningcoding

Fit order

Performance + adoption + fit

#75

Match score

40/100

Adoption

55/100

Install hint

huggingface-cli download microsoft/Phi-4-reasoning
Microsoft Phi product page
Ai2 logo
Upgrade

Molmo 72B

Ai2Apache 2.0

Large open visual reasoning

Use on high-memory systems when smaller vision models lack accuracy.

Parameters

72B

Q4 size

45 GB

RAM floor

128 GB

VRAM target

48 GB

Performance

38/100

Downloads

3.6K

visionreasoning

Fit order

Performance + adoption + fit

#76

Match score

36/100

Adoption

45/100

Install hint

huggingface-cli download allenai/Molmo-72B-0924
Ai2 OLMo
Last updated

2026-07-03

This page is maintained as a hardware-specific install path, not a static model catalog.

Latest maintenance note

Added the 8GiB CPU-only Ollama test record as supporting evidence.

Made the practical starting range clearer: try 0.5B to 3B before testing 7B.

Tightened the Ollama vs LM Studio advice for low-memory laptops.

Editorial checks

This guide orders local models by usable fit, not just model size.

Local LLM choices are easy to overstate. AI Jupyter treats a hardware page as a practical install order: what to try first, what to avoid, when to step down, and when the job should move to a stronger machine or hosted API.

Start with the machine

The model choice begins with RAM, VRAM, runtime, quantization, and whether the computer can stay usable while the model answers.

Separate load from comfort

A model that loads once is not automatically a good daily choice. The page favors models that still feel practical with normal apps open.

Prefer realistic limits

Large context, repository-wide coding, and long-document tasks are treated as separate workload limits instead of being hidden inside a single score.

Link to real test records

Where AI Jupyter has a real machine test, the guide links to screenshots, raw notes, and test JSON so the model pick can be checked.

Hardware decision profile

How I would treat a 8 GB RAM laptop before installing models

The most useful local LLM choice is rarely the biggest model in the list. This profile turns the hardware tier into a practical decision: what to try first, what it is good for, what not to force, and when to move up.

Practical fit

Start with

0.5B to 1B models

Use the first install to prove the runtime and laptop are comfortable before judging local AI quality.

Use it for

Private scratchpad work

Good for offline notes, short rewrites, quick checks, and learning local inference without depending on a cloud API.

Do not force

Long context or 7B daily use

If the machine swaps, heats up, or makes the browser annoying, the model is too heavy even if it technically loads.

Upgrade when

The model becomes the bottleneck

Move to 16GB, 32GB, a GPU box, or a hosted API when the work needs stronger reasoning or longer documents.

Quick answer

If your laptop has 8GB RAM, start smaller than you want to.

The best first local LLM is not the biggest one that barely loads. On 8GB RAM, the useful question is whether the model still feels responsive while your browser, notes, and operating system are open. For most people, that means 0.5B to 3B models first, then careful experiments with 7B later.

Install first

qwen2.5:0.5b

Fastest in the 8GiB CPU-only test at 58.02 tokens/s.

Comfortable small assistant

gemma3:1b

A better quality step while still staying responsive at 30.70 tokens/s.

Upper practical range

qwen2.5:3b

Usable at 16.77 tokens/s, but it no longer feels instant.

Tested evidence

Real 8GiB CPU-only test

AI Jupyter also ran six Ollama models in a Docker container capped at 8GiB RAM with no GPU passthrough. The test includes raw Docker proof, Ollama CPU logs, model list screenshots, and test JSON.

Read the 8GB test
Common 8GB questions

Plain answers before you install anything large

Can I run a local LLM on 8GB RAM?

Yes, but treat 0.5B to 3B as the normal range if you want the computer to stay pleasant. A 7B model may load, but it is usually a limit test on an 8GB CPU-only laptop.

What is the best Ollama model for 8GB RAM?

Start with qwen2.5:0.5b to confirm the setup, then try gemma3:1b for a more useful small assistant. Move to qwen2.5:3b only if the laptop still feels responsive.

Is LM Studio usable on an 8GB laptop?

Yes, if you keep one small model loaded at a time and avoid oversized context windows. LM Studio is easiest when you want a desktop chat workflow and manual unload controls.

Should I pick the biggest model that fits?

No. On 8GB RAM, the better default is the strongest small model that still lets you keep your browser, notes, and normal desktop apps open.

How do I know a model is too heavy for 8GB RAM?

If the browser starts lagging, later turns slow down sharply, the fan stays high, or you need to close normal apps just to keep the model running, step down to a smaller model.

Comfort signals

The model only counts if the computer still feels usable.

On an 8GB laptop, the pass/fail test is not whether a model loads once. It is whether you can keep a normal desktop open and still ask a few useful questions without waiting so long that you stop using it.

Green light

Keep this model

Replies start quickly, the laptop still feels normal, and a second or third turn does not suddenly become much slower.

Yellow light

Shorten context or step down

The first answer is fine, but longer chats, larger prompts, or keeping the browser open makes the machine feel busy.

Red light

Do not make it your default

The system swaps, other apps freeze, the runtime unloads the model, or every short answer takes enough time to break your flow.

Realistic 8GB decision table

What 8GB RAM is actually good for

If you want

You want private offline chat

Try

qwen2.5:0.5b or gemma3:1b

Verdict

Good 8GB fit

Use this for quick questions, notes, simple rewrites, and checking that your local setup works.

If you want

You want better answers and can wait

Try

qwen2.5:3b

Verdict

Test before defaulting

This is the practical upper range I would try before the machine starts feeling less casual.

If you want

You want long documents or repository-wide coding

Try

Hosted AI or a larger machine

Verdict

Usually too much for 8GB

The model may run, but context, tools, editor memory, and repeated turns make the experience fragile.

If you want

You want to experiment with 7B

Try

Keep it as a limit test

Verdict

Not the daily default

Try it once if you are curious, but do not judge local AI by the slowest model your laptop can barely hold.

10-minute sanity test

Test comfort before you test limits.

1

Close the heavy apps you do not need, but keep your normal browser or notes app open.

2

Run qwen2.5:0.5b first and ask one real prompt, not a synthetic test prompt.

3

Watch whether the machine still feels usable while the answer streams.

4

Move to gemma3:1b, then qwen2.5:3b only if the previous step still feels comfortable.

5

If the laptop swaps, heats up, or becomes annoying, step down instead of chasing a bigger model.

Which one should I actually install?

Match the model to the amount of patience you have.

Choose 0.5B when you want proof it works

Use qwen2.5:0.5b when you are checking whether Ollama, drivers, memory, and basic chat are working. It is the least frustrating first install.

Choose 1B when you want a small daily assistant

Use gemma3:1b when 0.5B feels too thin but you still care about quick replies on a normal laptop with other apps open.

Choose 3B when quality matters more than speed

Use qwen2.5:3b when you can wait a little and want better answers. Keep context short and test your real prompt before calling it your default.

Use hosted AI when the job is bigger than the machine

For long documents, repository-wide coding, heavy reasoning, or production reliability, an 8GB local model is usually a private scratchpad, not the final engine.

Ollama or LM Studio on 8GB?

Pick the runtime that keeps you testing, not guessing.

Use Ollama if you want repeatable tests

It is better for copy-paste commands, quick model swaps, repeated test runs, and keeping notes about exactly which model tag you tested.

Use LM Studio if you want a desktop workflow

It is better when you want model search, a chat UI, manual unload controls, and a clearer view of what is currently loaded.

Do not install everything at once

On 8GB RAM, disk space and memory pressure both matter. Install one small model, test it, then decide whether a larger download is worth it.

Install order

The order I would try on a normal 8GB laptop

First check

ollama run qwen2.5:0.5b

Use this to confirm your machine, runtime, and memory settings are sane.

Daily lightweight chat

ollama run gemma3:1b

A reasonable second install when you want a small assistant that still answers quickly.

Capability stretch

ollama run qwen2.5:3b

Try this if 1B feels too weak and you can tolerate slower replies.

What to avoid

The common 8GB mistakes

Do not start with 7B

A 7B model can load, but on an 8GB CPU-only setup it is usually a patience test, not a comfortable daily assistant.

Do not max out context

Large context windows eat memory quickly. Start near 2048 tokens and raise it only after the machine stays responsive.

Do not judge it with every app open

Browsers, IDEs, sync clients, and screen recorders matter on 8GB RAM. Close heavy apps before judging a model.

Why this page exists

A machine-first local LLM guide for 8 GB RAM laptop

Start with 0.5B to 3B models if you want replies that still feel interactive.

7B models can load on some 8GB machines, but they are usually the wrong daily default.

Keep the context window short and leave memory for the browser, editor, and operating system.

Default query
Device
8 GB RAM laptop
RAM
8 GB
GPU memory
Unified / none
Updated
2026-07-03
Next decision

What to check after this hardware guide

Local model choice usually changes after one of three checks: measured hardware comfort, API fallback cost, or whether the task actually needs a stronger hosted model.

Keep the machine check honest
Review the local model scoring method
Real-world fit

What 8GB RAM is actually comfortable doing

Good fit

  • Private drafts, short offline chat, quick rewrites, and simple checks with one small model loaded.
  • Testing whether Ollama or LM Studio belongs in your workflow before downloading larger models.
  • Learning how local inference feels without turning the whole laptop into a test rig.

Be careful

  • Long documents, repository-wide coding, high context settings, and multitasking with heavy apps open.
  • Using a 7B model as the daily default just because it can technically load.
  • Judging local AI by the slowest model an 8GB laptop can barely hold.
When to step up

Signals this hardware tier is no longer enough.

1

You regularly need longer context or stronger reasoning than 0.5B to 3B models can provide.

2

You want the browser, notes, editor, and model open at the same time without constant slowdown.

3

The local model needs to be a reliable work engine, not a private scratchpad.

Setup notes

How to use this guide

1

Install Ollama or LM Studio, then test a tiny model before downloading anything large.

2

Try qwen2.5:0.5b, gemma3:1b, or llama3.2:1b first; move to 3B only if speed still feels fine.

3

Use Q4 quantized packages, keep context near 2048 tokens, and close heavy desktop apps.

4

Treat 7B as an experiment on 8GB RAM, not the model you recommend to a non-technical friend.

Hardware FAQ

Practical answers before you install

Can an 8GB RAM laptop run a local LLM?

Yes. The practical range is small models, usually 0.5B to 3B if you want a smooth chat experience. Larger 7B models may load, but they often feel slow and leave very little memory headroom.

What is the safest first model for 8GB RAM?

Start with qwen2.5:0.5b or gemma3:1b. In AI Jupyter's 8GiB CPU-only test, qwen2.5:0.5b was the fastest first install, while gemma3:1b stayed comfortably usable.

Should I run a 7B model on 8GB RAM?

Only if you are testing limits. A 7B model can work, but it is usually slow on CPU-only 8GB machines and can push the system close to swapping once normal apps are open.

Is Ollama or LM Studio better for 8GB RAM?

Ollama is usually the fastest way to try command-line installs and repeatable tests. LM Studio is easier if you want a desktop interface, model search, and manual control over loaded models.

What if 3B models still feel too weak on 8GB RAM?

Use 8GB local models for private drafts, quick offline chat, and simple helpers, then switch to a hosted API or a larger machine when you need stronger reasoning, long documents, coding across a repository, or reliable production answers.

How much free memory should I leave before running a local LLM on 8GB RAM?

Leave enough memory for the operating system, browser, notes, and runtime overhead. If memory pressure is already high before the model starts, choose a smaller model or close heavy apps before judging speed.

More local LLM hardware guides

Best Local LLMs for 16GB RAM in 2026: Ollama and LM Studio Picks

Balanced local LLMs for 16 GB laptops and MacBooks.

Best Local LLMs for 32GB RAM in 2026: 7B, 8B, and 14B Picks

Stronger local LLMs for 32 GB RAM systems.

Best Local LLMs for RTX 5090 in 2026: 32 GB VRAM Picks

Top-end single-GPU local LLM picks for 32 GB VRAM RTX 5090 builds.

Best Local LLMs for RTX 5080 in 2026: 16 GB VRAM Picks

Practical local LLM picks for 16 GB VRAM RTX 5080 systems.

Best Local LLMs for RTX 5070 Ti in 2026: 16 GB VRAM Picks

Balanced 16 GB VRAM local LLM picks for RTX 5070 Ti machines.

Best Local LLMs for RTX 5070 in 2026: 12 GB VRAM Picks

12 GB VRAM local LLM picks for RTX 5070 desktop builds.

Best Local LLMs for RTX 5060 Ti 16GB in 2026: 16 GB VRAM Picks

16 GB VRAM local LLM picks for RTX 5060 Ti 16GB systems.

Best Local LLMs for RTX 5060 Ti 8GB in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 5060 Ti 8GB systems.

Best Local LLMs for RTX 5060 in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 5060 systems.

Best Local LLMs for RTX 4090 in 2026: 24GB VRAM Picks

High-performance local LLMs for 24 GB VRAM RTX 4090 builds.

Best Local LLMs for RTX 4080 in 2026: 16 GB VRAM Picks

16 GB VRAM local LLM picks for RTX 4080 and RTX 4080 Super systems.

Best Local LLMs for RTX 4070 Ti Super in 2026: 16 GB VRAM Picks

16 GB VRAM local LLM picks for RTX 4070 Ti Super desktops.

Best Local LLMs for RTX 4070 Ti in 2026: 12 GB VRAM Picks

12 GB VRAM local LLM picks for RTX 4070 Ti builds.

Best Local LLMs for RTX 4070 in 2026: 12 GB VRAM Picks

12 GB VRAM local LLM picks for RTX 4070 and RTX 4070 Super systems.

Best Local LLMs for RTX 4060 Ti 16GB in 2026: 16 GB VRAM Picks

16 GB VRAM local LLM picks for RTX 4060 Ti 16GB systems.

Best Local LLMs for RTX 4060 Ti 8GB in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 4060 Ti 8GB systems.

Best Local LLMs for RTX 4060 in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 4060 systems.

Best Local LLMs for RTX 3090 in 2026: 24 GB VRAM Picks

24 GB VRAM local LLM picks for RTX 3090 workstations.

Best Local LLMs for RTX 3080 in 2026: 10 GB VRAM Picks

10 GB VRAM local LLM picks for RTX 3080 systems.

Best Local LLMs for RTX 3070 in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 3070 systems.

Best Local LLMs for RTX 3060 Ti in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 3060 Ti systems.

Best Local LLMs for RTX 3060 in 2026: 12 GB VRAM Picks

12 GB VRAM local LLM picks for RTX 3060 systems.

Best Local LLMs for MacBook in 2026: Apple Silicon Picks

MacBook-friendly local LLMs for Apple Silicon unified memory.