Find a local LLM by computer, model, or install path.

Fits

Mistral NeMo 12B

Mistral AIApache 2.0

Efficient multilingual local chat

A good mid-size Mistral option for broad language coverage.

Parameters

12B

Q4 size

7.5 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

69/100

Pulls

5.2M

chatcodingWorkload match

Fit order

Performance + adoption + fit

Match score

75/100

Adoption

84/100

Install hint

ollama run mistral-nemo:12b

Fits

Gemma 3 12B

GoogleGemma terms

Balanced multimodal local model

Good on Apple Silicon with enough unified memory or a 12 GB GPU.

Parameters

12B

Q4 size

8.2 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

63/100

Pulls

38.2M

chatvisionreasoningWorkload match

Fit order

Performance + adoption + fit

Match score

74/100

Adoption

95/100

Install hint

ollama run gemma3:12b

Fits

Qwen3 8B

AlibabaApache 2.0

Default open local assistant

Strong everyday pick for multilingual chat, coding, and reasoning on consumer hardware.

Parameters

Q4 size

5.2 GB

RAM floor

16 GB

VRAM target

6 GB

Performance

62/100

Pulls

31.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

Match score

73/100

Adoption

94/100

Install hint

ollama run qwen3:8b

Fits

Gemma 4 E2B

GoogleGemma terms

Efficient edge multimodal use

Track this for compact visual assistants and quick local prototypes.

Parameters

E2B

Q4 size

5.5 GB

RAM floor

12 GB

VRAM target

6 GB

Performance

63/100

Pulls

16.5M

chatvisionWorkload match

Fit order

Performance + adoption + fit

Match score

73/100

Adoption

91/100

Install hint

ollama run gemma4:e2b

Fits

Phi-4 14B

MicrosoftMIT

Compact English reasoning

A practical option when you want stronger reasoning than tiny models.

Parameters

14B

Q4 size

9.1 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

65/100

Pulls

7.6M

reasoningchatcodingWorkload match

Fit order

Performance + adoption + fit

Match score

73/100

Adoption

87/100

Install hint

ollama run phi4:14b

Fits

Mistral 7B Instruct

Mistral AIApache 2.0

Classic lightweight local assistant

A stable baseline model for general local chat and tool experiments.

Parameters

Q4 size

4.4 GB

RAM floor

8 GB

VRAM target

6 GB

Performance

59/100

Pulls

30.7M

chatcodingWorkload match

Fit order

Performance + adoption + fit

Match score

72/100

Adoption

94/100

Install hint

ollama run mistral:7b

Fits

Gemma 4 12B

GoogleGemma terms

Fast workstation multimodal model

Good on Apple Silicon with enough unified memory or 12 GB plus GPUs.

Parameters

12B

Q4 size

7.6 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

62/100

Pulls

16.5M

chatreasoningvisioncodingWorkload match

Fit order

Performance + adoption + fit

Match score

72/100

Adoption

91/100

Install hint

ollama run gemma4:12b

Fits

Qwen2.5-Coder 7B

AlibabaApache 2.0

Small coding assistant

A practical code-focused model for local autocomplete, review, and refactor prompts.

Parameters

Q4 size

4.7 GB

RAM floor

16 GB

VRAM target

6 GB

Performance

60/100

Pulls

18M

codingchatWorkload match

Fit order

Performance + adoption + fit

Match score

71/100

Adoption

91/100

Install hint

ollama run qwen2.5-coder:7b

Ollama model library

Fits

DeepSeek-R1 Distill Llama 8B

DeepSeekMIT

Reasoning on common GPUs

A popular local R1 size for RTX 3060-class hardware.

Parameters

Q4 size

5.2 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

54/100

Pulls

88.9M

reasoningcodingchatWorkload match

Fit order

Performance + adoption + fit

#10

Match score

70/100

Adoption

100/100

Install hint

ollama run deepseek-r1:8b

Fits

Llama 3.2 Vision 11B

MetaLlama license

Local image understanding

A practical option for screenshots, visual Q&A, and document images.

Parameters

11B

Q4 size

7.5 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

61/100

Pulls

4.8M

visionchatWorkload match

Fit order

Performance + adoption + fit

#11

Match score

70/100

Adoption

84/100

Install hint

ollama run llama3.2-vision:11b

Fits

DeepSeek-R1 Distill Qwen 7B

DeepSeekMIT

Small local reasoning

Use for math, logic, and step-by-step problem solving on consumer hardware.

Parameters

Q4 size

4.7 GB

RAM floor

16 GB

VRAM target

6 GB

Performance

52/100

Pulls

88.9M

reasoningcodingchatWorkload match

Fit order

Performance + adoption + fit

#12

Match score

69/100

Adoption

100/100

Install hint

ollama run deepseek-r1:7b

Fits

Gemma 3 4B

GoogleGemma terms

Small multimodal assistant

A compact option when image input matters but hardware is limited.

Parameters

Q4 size

3 GB

RAM floor

8 GB

VRAM target

CPU / unified

Performance

52/100

Pulls

38.2M

chatvisionWorkload match

Fit order

Performance + adoption + fit

#13

Match score

68/100

Adoption

95/100

Install hint

ollama run gemma3:4b

Fits

Qwen2.5-VL 7B

AlibabaApache 2.0

Local visual question answering

Use when screenshots, charts, receipts, or documents matter more than pure text speed.

Parameters

Q4 size

5.5 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

55/100

Downloads

9.9M

visionchatWorkload match

Fit order

Performance + adoption + fit

#14

Match score

68/100

Adoption

88/100

Install hint

huggingface-cli download Qwen/Qwen2.5-VL-7B-Instruct

Fits

Granite 3.3 8B Instruct

IBMApache 2.0

Permissive local business model

Good candidate for private enterprise assistants and RAG evaluation.

Parameters

Q4 size

5 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

61/100

Pulls

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#15

Match score

68/100

Adoption

75/100

Install hint

ollama run granite3.3:8b

Fits

Qwen3 4B

AlibabaApache 2.0

Balanced small-device assistant

Good first model for 8 GB to 16 GB machines when speed matters.

Parameters

Q4 size

2.8 GB

RAM floor

8 GB

VRAM target

CPU / unified

Performance

51/100

Pulls

31.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#16

Match score

67/100

Adoption

94/100

Install hint

ollama run qwen3:4b

Fits

Llama 3.2 3B

MetaLlama license

Small general assistant

A lightweight Meta open-weight option for simple local use.

Parameters

Q4 size

2 GB

RAM floor

8 GB

VRAM target

CPU / unified

Performance

45/100

Pulls

74.8M

chatcodingWorkload match

Fit order

Performance + adoption + fit

#17

Match score

64/100

Adoption

99/100

Install hint

ollama run llama3.2:3b

Fits

LLaVA 7B

LLaVAApache 2.0

Classic local image chat

A widely supported visual assistant for screenshots and simple image questions.

Parameters

Q4 size

4.7 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

48/100

Pulls

14.3M

visionchatWorkload match

Fit order

Performance + adoption + fit

#18

Match score

64/100

Adoption

90/100

Install hint

ollama run llava:7b

Ollama LLaVA page

Fits

Pixtral 12B

Mistral AIApache 2.0

Mistral local vision assistant

Good for open multimodal experiments with the Mistral ecosystem.

Parameters

12B

Q4 size

7.8 GB

RAM floor

16 GB

VRAM target

12 GB

Performance

70/100

Pulls

n/a

visionchatWorkload match

Fit order

Performance + adoption + fit

#19

Match score

64/100

Adoption

35/100

Install hint

ollama run pixtral:12b

Fits

Phi-4 Mini Instruct

MicrosoftMIT

Small English reasoning

A compact model for local assistants, classification, and structured reasoning prompts.

Parameters

3.8B

Q4 size

2.6 GB

RAM floor

8 GB

VRAM target

CPU / unified

Performance

50/100

Pulls

1.3M

chatreasoningWorkload match

Fit order

Performance + adoption + fit

#20

Match score

62/100

Adoption

77/100

Install hint

ollama run phi4-mini

Fits

Phi-4 Multimodal

MicrosoftMIT

Small multimodal Microsoft model

Supports text, vision, and audio-style multimodal workflows depending on runtime support.

Parameters

5.6B

Q4 size

4.3 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

50/100

Downloads

528.7K

visionchatWorkload match

Fit order

Performance + adoption + fit

#21

Match score

61/100

Adoption

72/100

Install hint

huggingface-cli download microsoft/Phi-4-multimodal-instruct

Fits

Qwen3 1.7B

AlibabaApache 2.0

Low-memory local assistant

A better floor than sub-1B models when you need basic coding help on small devices.

Parameters

1.7B

Q4 size

1.3 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

39/100

Pulls

31.7M

chatcodingWorkload match

Fit order

Performance + adoption + fit

#22

Match score

60/100

Adoption

94/100

Install hint

ollama run qwen3:1.7b

Fits

Granite 4.0 Tiny

IBMApache 2.0

Next-generation Granite evaluation

Track this for Apache-licensed business deployment tests.

Parameters

Tiny MoE

Q4 size

4 GB

RAM floor

8 GB

VRAM target

CPU / unified

Performance

49/100

Downloads

144.1K

chatcodingWorkload match

Fit order

Performance + adoption + fit

#23

Match score

59/100

Adoption

65/100

Install hint

huggingface-cli download ibm-granite/granite-4.0-tiny-preview

Fits

Llama 3.2 1B

MetaLlama license

Tiny local chat

Runs almost everywhere and is useful for local automation tests.

Parameters

Q4 size

0.9 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

35/100

Pulls

74.8M

chatWorkload match

Fit order

Performance + adoption + fit

#24

Match score

58/100

Adoption

99/100

Install hint

ollama run llama3.2:1b

Fits

Gemma 3 1B

GoogleGemma terms

Tiny Google open model tests

Useful for mobile-adjacent experiments and low-memory chat.

Parameters

Q4 size

0.8 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

34/100

Pulls

38.2M

chatWorkload match

Fit order

Performance + adoption + fit

#25

Match score

57/100

Adoption

95/100

Install hint

ollama run gemma3:1b

Fits

Granite 3.3 2B Instruct

IBMApache 2.0

Small enterprise-friendly assistant

Apache-licensed model for teams that care about permissive licensing.

Parameters

Q4 size

1.5 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

41/100

Pulls

chatcodingWorkload match

Fit order

Performance + adoption + fit

#26

Match score

57/100

Adoption

75/100

Install hint

ollama run granite3.3:2b

Fits

OLMo 2 7B Instruct

Ai2Apache 2.0

Fully open research baseline

Choose OLMo when training data and model transparency matter.

Parameters

Q4 size

4.8 GB

RAM floor

16 GB

VRAM target

6 GB

Performance

49/100

Downloads

49.5K

chatWorkload match

Fit order

Performance + adoption + fit

#27

Match score

57/100

Adoption

59/100

Install hint

huggingface-cli download allenai/OLMo-2-1124-7B-Instruct

Fits

DeepSeek-R1 Distill Qwen 1.5B

DeepSeekMIT

Tiny reasoning experiments

Good for testing reasoning prompts locally before moving to larger R1 distills.

Parameters

1.5B

Q4 size

1.1 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

29/100

Pulls

88.9M

reasoningchatWorkload match

Fit order

Performance + adoption + fit

#28

Match score

55/100

Adoption

100/100

Install hint

ollama run deepseek-r1:1.5b

Fits

Qwen3 0.6B

AlibabaApache 2.0

Tiny local chat and quick smoke tests

Runs on almost any laptop, but keep expectations modest for coding and reasoning.

Parameters

0.6B

Q4 size

0.6 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

31/100

Pulls

31.7M

chatWorkload match

Fit order

Performance + adoption + fit

#29

Match score

55/100

Adoption

94/100

Install hint

ollama run qwen3:0.6b

Fits

Molmo 7B-D

Ai2Apache 2.0

Open visual reasoning research

A transparent vision-language model family for local multimodal tests.

Parameters

Q4 size

5.5 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

43/100

Downloads

23K

visionchatWorkload match

Fit order

Performance + adoption + fit

#30

Match score

53/100

Adoption

55/100

Install hint

huggingface-cli download allenai/Molmo-7B-D-0924

Tencent Hunyuan Hugging Face

Fits

Tencent Hunyuan 1.8B Instruct

TencentTencent Hunyuan license

Tiny Hunyuan local tests

Useful when you want a very small Chinese-English local model.

Parameters

1.8B

Q4 size

1.4 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

28/100

Downloads

446

chatWorkload match

Fit order

Performance + adoption + fit

#31

Match score

39/100

Adoption

33/100

Install hint

huggingface-cli download tencent/Hunyuan-1.8B-Instruct

Fits

E5-Mistral 7B Instruct

MicrosoftMIT

Large instruction embeddings

Use when retrieval quality matters more than embedding throughput.

Parameters

Q4 size

4.8 GB

RAM floor

16 GB

VRAM target

8 GB

Performance

37/100

Downloads

403.8K

embedding

Fit order

Performance + adoption + fit

#32

Match score

49/100

Adoption

71/100

Install hint

huggingface-cli download intfloat/e5-mistral-7b-instruct

Fits

nomic-embed-text

NomicApache 2.0

Local semantic search

Use for private RAG indexes, document search, and lightweight retrieval.

Parameters

137M

Q4 size

0.3 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

0/100

Pulls

76.8M

embedding

Fit order

Performance + adoption + fit

#33

Match score

34/100

Adoption

99/100

Install hint

ollama pull nomic-embed-text

Fits

mxbai-embed-large

MixedbreadApache 2.0

Higher-quality local embeddings

Good default when embedding quality matters more than model size.

Parameters

335M

Q4 size

0.7 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

0/100

Pulls

12.1M

embedding

Fit order

Performance + adoption + fit

#34

Match score

31/100

Adoption

89/100

Install hint

ollama pull mxbai-embed-large

Fits

BGE-M3

BAAIMIT

Multilingual RAG retrieval

A strong embedding pick for multilingual search and hybrid retrieval workflows.

Parameters

568M

Q4 size

1.2 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

0/100

Pulls

4.9M

embedding

Fit order

Performance + adoption + fit

#35

Match score

30/100

Adoption

84/100

Install hint

ollama pull bge-m3

Fits

Jina Embeddings v3

Jina AICC BY-NC 4.0

Multilingual document embeddings

Check license terms before commercial usage; useful for local multilingual evaluation.

Parameters

572M

Q4 size

1.2 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

0/100

Downloads

2.8M

embedding

Fit order

Performance + adoption + fit

#36

Match score

29/100

Adoption

81/100

Install hint

huggingface-cli download jinaai/jina-embeddings-v3

Fits

Snowflake Arctic Embed L

SnowflakeApache 2.0

Enterprise-style local retrieval

A permissive embedding model for private search pipelines.

Parameters

334M

Q4 size

0.8 GB

RAM floor

4 GB

VRAM target

CPU / unified

Performance

0/100

Downloads

36.3K

embedding

Fit order

Performance + adoption + fit

#37

Match score

24/100

Adoption

57/100

Install hint

huggingface-cli download Snowflake/snowflake-arctic-embed-l

Upgrade

Llama 3.3 70B

MetaLlama license

Large general open-weight assistant

A common baseline for strong local text performance on large rigs.

Parameters

70B

Q4 size

43 GB

RAM floor

128 GB

VRAM target

48 GB

Performance

82/100

Pulls

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#38

Match score

75/100

Adoption

83/100

Install hint

ollama run llama3.3:70b

Upgrade

Mixtral 8x7B

Mistral AIApache 2.0

MoE general assistant

Still useful for local MoE experiments and multilingual workloads.

Parameters

46.7B MoE

Q4 size

26 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

82/100

Pulls

2.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#39

Match score

74/100

Adoption

81/100

Install hint

ollama run mixtral:8x7b

Upgrade

Gemma 3 27B

GoogleGemma terms

High-quality local multimodal work

Best reserved for 24 GB GPUs, high-memory Macs, or larger systems.

Parameters

27B

Q4 size

17 GB

RAM floor

48 GB

VRAM target

24 GB

Performance

74/100

Pulls

38.2M

chatvisionreasoningWorkload match

Fit order

Performance + adoption + fit

#40

Match score

73/100

Adoption

95/100

Install hint

ollama run gemma3:27b

Upgrade

Qwen3 32B

AlibabaApache 2.0

Workstation-grade open model

A serious local upgrade for coding, agent workflows, and difficult reasoning tasks.

Parameters

32B

Q4 size

20 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

74/100

Pulls

31.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#41

Match score

73/100

Adoption

94/100

Install hint

ollama run qwen3:32b

Upgrade

Qwen3 30B-A3B

AlibabaApache 2.0

Efficient MoE reasoning

Mixture-of-experts design can offer strong quality without activating every parameter.

Parameters

30B MoE

Q4 size

18 GB

RAM floor

48 GB

VRAM target

16 GB

Performance

74/100

Pulls

31.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#42

Match score

73/100

Adoption

94/100

Install hint

ollama run qwen3:30b-a3b

Upgrade

Mistral Small 3.2 24B

Mistral AIApache 2.0

Updated 24B multimodal assistant

A newer Small-family option for instruction following, vision, and tool use.

Parameters

24B

Q4 size

15 GB

RAM floor

32 GB

VRAM target

24 GB

Performance

82/100

Downloads

663.4K

chatcodingvisionWorkload match

Fit order

Performance + adoption + fit

#43

Match score

73/100

Adoption

73/100

Install hint

huggingface-cli download mistralai/Mistral-Small-3.2-24B-Instruct-2506

Upgrade

Gemma 4 31B

GoogleGemma terms

Frontier local workstation model

A larger Gemma option for capable desktops and local multimodal workflows.

Parameters

31B

Q4 size

20 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

74/100

Pulls

16.5M

reasoningcodingvisionchatWorkload match

Fit order

Performance + adoption + fit

#44

Match score

72/100

Adoption

91/100

Install hint

ollama run gemma4:31b

Upgrade

Gemma 4 26B-A4B

GoogleGemma terms

Efficient larger multimodal model

Useful when you want a larger Gemma family model without activating every parameter.

Parameters

26B MoE

Q4 size

16.5 GB

RAM floor

48 GB

VRAM target

24 GB

Performance

74/100

Pulls

16.5M

chatreasoningvisionWorkload match

Fit order

Performance + adoption + fit

#45

Match score

72/100

Adoption

91/100

Install hint

ollama run gemma4:26b-a4b

Upgrade

Mistral Small 3.1 24B

Mistral AIApache 2.0

High-quality local assistant

Strong when you need low-latency function calling and multimodal input.

Parameters

24B

Q4 size

15 GB

RAM floor

32 GB

VRAM target

24 GB

Performance

82/100

Downloads

161.7K

chatcodingvisionWorkload match

Fit order

Performance + adoption + fit

#46

Match score

71/100

Adoption

66/100

Install hint

huggingface-cli download mistralai/Mistral-Small-3.1-24B-Instruct-2503

Upgrade

Qwen3 235B-A22B

AlibabaApache 2.0

Server-class open-weight deployment

Track this for cluster or multi-GPU servers rather than normal desktops.

Parameters

235B MoE

Q4 size

150 GB

RAM floor

256 GB

VRAM target

80 GB

Performance

74/100

Downloads

825.3K

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#47

Match score

68/100

Adoption

74/100

Install hint

huggingface-cli download Qwen/Qwen3-235B-A22B

Upgrade

Llama 4 Scout

MetaLlama license

Long-context multimodal servers

Open-weight Llama 4 model for server-side local deployment planning.

Parameters

17B active MoE

Q4 size

90 GB

RAM floor

192 GB

VRAM target

80 GB

Performance

74/100

Downloads

744.1K

chatvisionreasoningWorkload match

Fit order

Performance + adoption + fit

#48

Match score

68/100

Adoption

74/100

Install hint

huggingface-cli download meta-llama/Llama-4-Scout-17B-16E-Instruct

Upgrade

Qwen3 14B

AlibabaApache 2.0

Higher-quality local reasoning

Useful when 8B is not consistent enough and you still want practical local speed.

Parameters

14B

Q4 size

9 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

65/100

Pulls

31.7M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#49

Match score

67/100

Adoption

94/100

Install hint

ollama run qwen3:14b

Moonshot Kimi K2 repository

Upgrade

Qwen2.5-Coder 14B

AlibabaApache 2.0

Local repository edits

A useful middle tier when 7B misses framework details but 32B is too heavy.

Parameters

14B

Q4 size

9 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

65/100

Pulls

18M

codingchatWorkload match

Fit order

Performance + adoption + fit

#50

Match score

66/100

Adoption

91/100

Install hint

ollama run qwen2.5-coder:14b

Ollama model library

Upgrade

Kimi K2 Instruct

Moonshot AIModified MIT

Large open agentic model

Track this for hosted local infrastructure, coding agents, and long-context analysis.

Parameters

1T MoE

Q4 size

600 GB

RAM floor

768 GB

VRAM target

240 GB

Performance

70/100

Downloads

436K

codingreasoningchatWorkload match

Fit order

Performance + adoption + fit

#51

Match score

65/100

Adoption

71/100

Install hint

huggingface-cli download moonshotai/Kimi-K2-Instruct

Upgrade

Llama 4 Maverick

MetaLlama license

Very large local AI infrastructure

Treat this as a data-center candidate, not a normal desktop install.

Parameters

17B active MoE

Q4 size

250 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

74/100

Downloads

46.5K

chatvisionreasoningWorkload match

Fit order

Performance + adoption + fit

#52

Match score

64/100

Adoption

59/100

Install hint

huggingface-cli download meta-llama/Llama-4-Maverick-17B-128E-Instruct

Upgrade

MiniMax M2

MiniMaxModified MIT

Open-weight server assistant

Evaluate license and deployment constraints before production use.

Parameters

MoE

Q4 size

280 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

70/100

Downloads

114K

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#53

Match score

63/100

Adoption

64/100

Install hint

huggingface-cli download MiniMaxAI/MiniMax-M2

MiniMax M2 repository

Upgrade

LLaVA 13B

LLaVAApache 2.0

Stronger local image chat

Use this when the 7B model misses details and you have enough memory.

Parameters

13B

Q4 size

8 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

59/100

Pulls

14.3M

visionchatWorkload match

Fit order

Performance + adoption + fit

#54

Match score

62/100

Adoption

90/100

Install hint

ollama run llava:13b

Ollama LLaVA page

Upgrade

GLM-4.7 Flash

Z.aiMIT

Efficient GLM deployment

A lighter GLM-family option for local servers and larger workstations.

Parameters

30B-A3B MoE

Q4 size

18 GB

RAM floor

48 GB

VRAM target

24 GB

Performance

62/100

Downloads

2.4M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#55

Match score

62/100

Adoption

80/100

Install hint

huggingface-cli download zai-org/GLM-4.7-Flash

Z.ai GLM-4.5 repository

Upgrade

MiniMax M2.7

MiniMaxModified MIT

Newest MiniMax open-weight candidate

Use as a research and infrastructure planning entry until local runtimes mature.

Parameters

MoE

Q4 size

280 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

62/100

Downloads

1.6M

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#56

Match score

61/100

Adoption

78/100

Install hint

huggingface-cli download MiniMaxAI/MiniMax-M2.7

MiniMax M2 repository

Upgrade

MiniMax M2.5

MiniMaxModified MIT

Updated MiniMax local server track

A large open-weight candidate for organizations with inference clusters.

Parameters

MoE

Q4 size

280 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

62/100

Downloads

658.8K

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#57

Match score

60/100

Adoption

73/100

Install hint

huggingface-cli download MiniMaxAI/MiniMax-M2.5

MiniMax M2 repository

Upgrade

GLM-4.5 Air

Z.aiMIT

Large open agent model

A high-capability open model for servers with strong memory budgets.

Parameters

106B MoE

Q4 size

65 GB

RAM floor

128 GB

VRAM target

80 GB

Performance

62/100

Downloads

394.3K

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#58

Match score

60/100

Adoption

70/100

Install hint

huggingface-cli download zai-org/GLM-4.5-Air

Z.ai GLM-4.5 repository

Upgrade

GLM-4.5

Z.aiMIT

Cluster-class open agent model

Useful for tracking frontier open-weight systems, but not ordinary local desktops.

Parameters

355B MoE

Q4 size

220 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

62/100

Downloads

160.8K

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#59

Match score

58/100

Adoption

65/100

Install hint

huggingface-cli download zai-org/GLM-4.5

Z.ai GLM-4.5 repository

Upgrade

Granite 4.0 Small

IBMApache 2.0

Business-grade open model testing

A larger Granite candidate for teams that want an Apache-licensed model as a default.

Parameters

Small MoE

Q4 size

12 GB

RAM floor

32 GB

VRAM target

16 GB

Performance

71/100

Downloads

n/a

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#60

Match score

57/100

Adoption

35/100

Install hint

huggingface-cli download ibm-granite/granite-4.0-small-preview

Upgrade

OLMo 2 13B Instruct

Ai2Apache 2.0

Transparent open-model evaluation

A good comparison point for fully open training pipelines.

Parameters

13B

Q4 size

8.5 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

60/100

Downloads

8.2K

chatreasoningWorkload match

Fit order

Performance + adoption + fit

#61

Match score

53/100

Adoption

49/100

Install hint

huggingface-cli download allenai/OLMo-2-1124-13B-Instruct

Upgrade

OLMo 3 32B

Ai2Apache 2.0

Fully open large-model research

A larger transparent model family entry for reproducible AI research.

Parameters

32B

Q4 size

20 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

62/100

Downloads

n/a

chatreasoningWorkload match

Fit order

Performance + adoption + fit

#62

Match score

51/100

Adoption

35/100

Install hint

huggingface-cli download allenai/OLMo-3-32B

Tencent Hunyuan Hugging Face

Upgrade

Tencent Hunyuan Large

TencentTencent Hunyuan license

Large local infrastructure reference

Track for server deployments and China-market open model coverage.

Parameters

389B MoE

Q4 size

240 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

62/100

Downloads

228

chatcodingreasoningWorkload match

Fit order

Performance + adoption + fit

#63

Match score

50/100

Adoption

30/100

Install hint

huggingface-cli download tencent/Tencent-Hunyuan-Large

Upgrade

DeepSeek-R1 Distill Llama 70B

DeepSeekMIT

Large local reasoning servers

Better suited to multi-GPU rigs or high-memory Apple Silicon systems.

Parameters

70B

Q4 size

43 GB

RAM floor

128 GB

VRAM target

48 GB

Performance

50/100

Pulls

88.9M

reasoningcoding

Fit order

Performance + adoption + fit

#64

Match score

56/100

Adoption

100/100

Install hint

ollama run deepseek-r1:70b

Upgrade

DeepSeek-R1 Distill Qwen 32B

DeepSeekMIT

Serious local reasoning

Use when answer quality matters more than speed and you have workstation memory.

Parameters

32B

Q4 size

20 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

50/100

Pulls

88.9M

reasoningcoding

Fit order

Performance + adoption + fit

#65

Match score

56/100

Adoption

100/100

Install hint

ollama run deepseek-r1:32b

Upgrade

Qwen2.5-Coder 32B

AlibabaApache 2.0

High-quality local coding

One of the strongest widely used open coding models for local workstations.

Parameters

32B

Q4 size

20 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

50/100

Pulls

18M

codingreasoning

Fit order

Performance + adoption + fit

#66

Match score

54/100

Adoption

91/100

Install hint

ollama run qwen2.5-coder:32b

Ollama model library

Upgrade

Llama 3.2 Vision 90B

MetaLlama license

Large local vision servers

Use this only on serious multi-GPU or high-memory systems.

Parameters

90B

Q4 size

56 GB

RAM floor

128 GB

VRAM target

80 GB

Performance

50/100

Pulls

4.8M

visionreasoning

Fit order

Performance + adoption + fit

#67

Match score

52/100

Adoption

84/100

Install hint

ollama run llama3.2-vision:90b

Upgrade

LLaVA 34B

LLaVAApache 2.0

Large local visual assistant

A workstation-class LLaVA model for more demanding visual tasks.

Parameters

34B

Q4 size

21 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

46/100

Pulls

14.3M

visionreasoning

Fit order

Performance + adoption + fit

#68

Match score

51/100

Adoption

90/100

Install hint

ollama run llava:34b

Ollama LLaVA page

Upgrade

DeepSeek-R1 Distill Qwen 14B

DeepSeekMIT

Better local math and logic

A sensible upgrade when 7B and 8B distills are too brittle.

Parameters

14B

Q4 size

9 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

41/100

Pulls

88.9M

reasoningcoding

Fit order

Performance + adoption + fit

#69

Match score

50/100

Adoption

100/100

Install hint

ollama run deepseek-r1:14b

Upgrade

Codestral 22B

Mistral AIMistral research license

Code completion and generation

Use for code-specific workflows; check license terms before commercial deployment.

Parameters

22B

Q4 size

13 GB

RAM floor

32 GB

VRAM target

16 GB

Performance

49/100

Pulls

1.3M

coding

Fit order

Performance + adoption + fit

#70

Match score

50/100

Adoption

77/100

Install hint

ollama run codestral:22b

Upgrade

Qwen2.5-VL 32B

AlibabaApache 2.0

Large local multimodal analysis

A workstation-class option for documents, diagrams, UI review, and visual reasoning.

Parameters

32B

Q4 size

22 GB

RAM floor

64 GB

VRAM target

24 GB

Performance

50/100

Downloads

449K

visionreasoning

Fit order

Performance + adoption + fit

#71

Match score

49/100

Adoption

71/100

Install hint

huggingface-cli download Qwen/Qwen2.5-VL-32B-Instruct

Upgrade

DeepSeek-R1

DeepSeekMIT

Cluster-scale open reasoning

Reference the full model for server planning, not single-desktop installation.

Parameters

671B MoE

Q4 size

404 GB

RAM floor

512 GB

VRAM target

160 GB

Performance

38/100

Downloads

7.7M

reasoningcoding

Fit order

Performance + adoption + fit

#72

Match score

45/100

Adoption

87/100

Install hint

huggingface-cli download deepseek-ai/DeepSeek-R1

Moonshot Kimi K2 repository

Upgrade

Kimi K2.6

Moonshot AIModified MIT

Native multimodal agentic model

A server-scale candidate for teams tracking open agent models.

Parameters

1T MoE

Q4 size

600 GB

RAM floor

768 GB

VRAM target

240 GB

Performance

38/100

Downloads

2.3M

codingreasoningvision

Fit order

Performance + adoption + fit

#73

Match score

44/100

Adoption

80/100

Install hint

huggingface-cli download moonshotai/Kimi-K2.6

Upgrade

Devstral Small 24B

Mistral AIApache 2.0

Agentic coding tasks

A code-agent oriented open model for repository-level work.

Parameters

24B

Q4 size

15 GB

RAM floor

32 GB

VRAM target

24 GB

Performance

50/100

Downloads

5.9K

coding

Fit order

Performance + adoption + fit

#74

Match score

44/100

Adoption

47/100

Install hint

huggingface-cli download mistralai/Devstral-Small-2505

Upgrade

Phi-4 Reasoning

MicrosoftMIT

Reasoning-focused Phi workflows

Pick this over base Phi-4 when step-by-step problem solving is the main job.

Parameters

14B

Q4 size

9.3 GB

RAM floor

24 GB

VRAM target

12 GB

Performance

42/100

Downloads

23.1K

reasoningcoding

Fit order

Performance + adoption + fit

#75

Match score

40/100

Adoption

55/100

Install hint

huggingface-cli download microsoft/Phi-4-reasoning

Upgrade

Molmo 72B

Ai2Apache 2.0

Large open visual reasoning

Use on high-memory systems when smaller vision models lack accuracy.

Parameters

72B

Q4 size

45 GB

RAM floor

128 GB

VRAM target

48 GB

Performance

38/100

Downloads

3.6K

visionreasoning

Fit order

Performance + adoption + fit

#76

Match score

36/100

Adoption

45/100

Install hint

huggingface-cli download allenai/Molmo-72B-0924

Use the semantic tool page

Common next steps

Use the result to open the right guide, not another generic list.

The calculator answers fit. These paths answer what to do next: install one Ollama model, open hardware guides, check target model pages, or read measured tests.

Calculator first, guide second

Which local LLM can I run?

I know my RAM and GPU, but I do not know which model to install first.

Enter RAM, VRAM, machine type, and workload. Start with the top Fits rows before testing Stretch models.

Which Ollama model should I install?

I want one Ollama command that will not waste a huge download.

Filter by RAM and workload, then copy the install hint from a model marked Fits comfortably.

Open the 8GB Ollama path

Should I buy a bigger GPU for local AI?

I am comparing RTX 3060, RTX 4060 Ti 16GB, RTX 4090, RTX 5090, or a 48GB workstation.

Open the closest hardware guide first, then compare model size, VRAM headroom, and real test records before buying.

Check the RTX 4090 guide

What hardware do I need for one model?

I want to run Qwen3 32B, DeepSeek-R1 Distill Qwen 32B, Gemma 27B, or Llama 70B.

Open the target model requirement page and check RAM floor, VRAM target, Q4 size, and install notes.

DeepSeek-R1 Distill Qwen 32B

A 24GB VRAM local reasoning path for DeepSeek-R1 Distill Qwen 32B.

64GB RAM floor / 24GB VRAM target

70B model

Llama 3.3 70B

A large-model planning guide for running Llama 3.3 70B locally.

128GB RAM floor / 48GB VRAM target

Device and model scenarios

Start here when your search includes both a GPU or laptop and a model name.

These pages answer the questions people ask right before spending time or money: whether RTX 4060 Ti 16GB should try Qwen3 32B, whether RTX 3060 12GB is enough for 14B, whether 32GB RAM can handle 32B, and which Qwen or DeepSeek model belongs on a MacBook.

Cross-intent pages

RTX 4060 Ti 16GB desktop

RTX 4060 Ti 16GB + Qwen / DeepSeek

A practical Qwen3 and DeepSeek install path for RTX 4060 Ti 16GB systems.

RTX 3060 12GB desktop

RTX 3060 12GB + 14B models

A 12GB VRAM install path for Qwen3 14B and DeepSeek local models.

32GB RAM desktop or laptop

32GB RAM + 32B local models

A realistic guide for 32GB RAM users tempted by 32B local models.

Apple Silicon MacBook

MacBook + Qwen / DeepSeek

A MacBook-focused Qwen3 and DeepSeek local model path.

Start here for 8GB laptops

Most low-memory local LLM mistakes start with choosing a model that is too big.

If your machine has 8GB RAM and no dedicated GPU, begin with the focused 8GB guide. It explains the practical Ollama and LM Studio picks, why 7B is usually a slow boundary, and which small models stayed responsive in a CPU-only test record.

Closest real test

Best Local LLMs for 8GB RAM in 2026

Practical local LLM picks for 8GB laptops and CPU-only machines.

Open 8GB guide Read CPU-only test

Choose by machine first

Choose by machine, then by job.

Local LLM choice is usually decided by memory pressure before any score. The best first model is the one that stays responsive on your real laptop or workstation, with your normal apps open and your actual prompts in the loop.

Check the 8GB CPU-only test

1. Leave memory headroom

Do not choose the largest model that barely loads. Leave room for the operating system, browser, editor, documents, and a practical context window.

2. Match the job before the size

A small coding model can beat a larger general chat model for repository work. The same is true for embeddings, vision, and short daily chat.

3. Pick the runtime path

Use Ollama for repeatable command-line tests. Use LM Studio when you want a desktop interface, model search, and manual control over loaded models.

4. Test your own prompt

Run one real prompt from your normal work before making the model your default. Check speed, memory pressure, answer quality, license, and whether it stays usable after a few turns.

Most useful starting points

Open the guide closest to your machine first. If you want measured speed, screenshots, and memory behavior, open the CPU-only or GPU-fit test records.

8GB RAM laptop

Ollama and LM Studio picks for low-memory machines. Start with 0.5B to 3B before testing 7B.

8GB CPU-only test

Measured Ollama speed, memory pressure, and slow boundary on an 8GiB CPU-only setup.

8GB VRAM GPU test

Measured tokens per second, nvidia-smi screenshots, and peak memory changes under an 8GB VRAM budget.

16GB machine

Good for small assistants, coding helpers, and careful 7B to 8B experiments.

32GB desktop

A practical range for stronger 7B to 14B models and larger context tests.

Apple Silicon MacBook

Unified memory changes the fit calculation. Check memory headroom before model size.

RTX 4090 workstation

Open the page closest to your machine. Each page starts with the hardware limit first, then lets you adjust RAM, VRAM, workload, and model search.

Updated 2026-07-02

8 GB RAM laptop

Best Local LLMs for 8GB RAM in 2026: Ollama, LM Studio, and Model Picks

Practical local LLM picks for 8GB laptops and CPU-only machines.

8 GB RAMNo VRAM

16 GB RAM machine

Best Local LLMs for 16GB RAM in 2026: Ollama and LM Studio Picks

Balanced local LLMs for 16 GB laptops and MacBooks.

16 GB RAMNo VRAM

32 GB RAM desktop

Best Local LLMs for 32GB RAM in 2026: 7B, 8B, and 14B Picks

Stronger local LLMs for 32 GB RAM systems.

32 GB RAMNo VRAM

RTX 5090 workstation

Best Local LLMs for RTX 5090 in 2026: 32 GB VRAM Picks

Top-end single-GPU local LLM picks for 32 GB VRAM RTX 5090 builds.

64 GB RAM32 GB VRAM

RTX 5080 workstation

Best Local LLMs for RTX 5080 in 2026: 16 GB VRAM Picks

Practical local LLM picks for 16 GB VRAM RTX 5080 systems.

RTX 5070 Ti workstation

Best Local LLMs for RTX 5070 Ti in 2026: 16 GB VRAM Picks

Balanced 16 GB VRAM local LLM picks for RTX 5070 Ti machines.

RTX 5070 workstation

Best Local LLMs for RTX 5070 in 2026: 12 GB VRAM Picks

12 GB VRAM local LLM picks for RTX 5070 desktop builds.

RTX 5060 Ti 16GB workstation

Best Local LLMs for RTX 5060 Ti 16GB in 2026: 16 GB VRAM Picks

16 GB VRAM local LLM picks for RTX 5060 Ti 16GB systems.

32 GB RAM16 GB VRAM

RTX 5060 Ti 8GB workstation

Best Local LLMs for RTX 5060 Ti 8GB in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 5060 Ti 8GB systems.

RTX 5060 workstation

Best Local LLMs for RTX 5060 in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 5060 systems.

RTX 4090 workstation

Best Local LLMs for RTX 4090 in 2026: 24GB VRAM Picks

High-performance local LLMs for 24 GB VRAM RTX 4090 builds.

64 GB RAM24 GB VRAM

RTX 4080 workstation

Best Local LLMs for RTX 4080 in 2026: 16 GB VRAM Picks

16 GB VRAM local LLM picks for RTX 4080 and RTX 4080 Super systems.

RTX 4070 Ti Super workstation

Best Local LLMs for RTX 4070 Ti Super in 2026: 16 GB VRAM Picks

16 GB VRAM local LLM picks for RTX 4070 Ti Super desktops.

RTX 4070 Ti workstation

Best Local LLMs for RTX 4070 Ti in 2026: 12 GB VRAM Picks

12 GB VRAM local LLM picks for RTX 4070 Ti builds.

RTX 4070 workstation

Best Local LLMs for RTX 4070 in 2026: 12 GB VRAM Picks

12 GB VRAM local LLM picks for RTX 4070 and RTX 4070 Super systems.

RTX 4060 Ti 16GB workstation

Best Local LLMs for RTX 4060 Ti 16GB in 2026: 16 GB VRAM Picks

16 GB VRAM local LLM picks for RTX 4060 Ti 16GB systems.

32 GB RAM16 GB VRAM

RTX 4060 Ti 8GB workstation

Best Local LLMs for RTX 4060 Ti 8GB in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 4060 Ti 8GB systems.

RTX 4060 workstation

Best Local LLMs for RTX 4060 in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 4060 systems.

RTX 3090 workstation

Best Local LLMs for RTX 3090 in 2026: 24 GB VRAM Picks

24 GB VRAM local LLM picks for RTX 3090 workstations.

64 GB RAM24 GB VRAM

RTX 3080 workstation

Best Local LLMs for RTX 3080 in 2026: 10 GB VRAM Picks

10 GB VRAM local LLM picks for RTX 3080 systems.

32 GB RAM10 GB VRAM

RTX 3070 workstation

Best Local LLMs for RTX 3070 in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 3070 systems.

RTX 3060 Ti workstation

Best Local LLMs for RTX 3060 Ti in 2026: 8 GB VRAM Picks

8 GB VRAM local LLM picks for RTX 3060 Ti systems.

RTX 3060 workstation

Best Local LLMs for RTX 3060 in 2026: 12 GB VRAM Picks

12 GB VRAM local LLM picks for RTX 3060 systems.