Weighted ranking report

Best AI for Coding

Agentic coding, web development, repository fixes, and production engineering tasks. This report blends public leaderboard signals into one task-specific composite score, then highlights the practical cautions that matter before you choose a model.

Use this coding ranking to shortlist AI models for repository edits, agentic coding, bug fixes, code review, and web application generation.

Page value

Repository fixes and agentic coding shortlist.

Data basis

5 public sources · 86 models

Updated

2026-06-12

Current winner

Claude Fable 5

Adjusted score

99.9

Snapshot

2026-06-12

Top contenders

The leading five models in this composite

86 total models
1Claude logo

Claude Fable 5

Model 99.9 · Confidence 100%

99.9

2Claude logo

Claude Opus 4.8

Model 92.7 · Confidence 100%

92.7

3Claude logo

Claude Opus 4.7 Thinking

Model 94 · Confidence 69%

89.9

4Claude logo

Claude Opus 4.8 Thinking

Model 94 · Confidence 69%

89.9

5Claude logo

Claude Opus 4.7

Model 89.9 · Confidence 100%

89.9

Full-source models

5

Average coverage

39%

Top-five spread

10

All ranked models

Complete composite model ranking

Showing all 86 models with at least one confirmed source row in this category. Models with no category source coverage are excluded. Confirmed rows are ordered by Bayesian-smoothed adjusted score; missing source rows stay n/a instead of counting as zero.

1Claude logo

Claude Fable 5

AnthropicProprietary API

Highest-confidence coding and autonomous engineering tasks.

Caution

Premium model; test cost and access limits before building workflow dependency.

Source coverage5/5

100% confirmed coverage · 100% confidence

Code Arena100Vals SWE-bench Verified100Vals Vibe Code Bench100Vellum99AA Index100

Adjusted score

99.9

#1

Model

99.9

Confidence

100%

2Claude logo

Claude Opus 4.8

AnthropicProprietary API

Large code changes where review quality matters.

Caution

Strong but expensive; use Sonnet-class models for routine edits if budget matters.

Source coverage5/5

100% confirmed coverage · 100% confidence

Code Arena93Vals SWE-bench Verified93Vals Vibe Code Bench91Vellum93AA Index94

Adjusted score

92.7

#2

Model

92.7

Confidence

100%

3Claude logo

Claude Opus 4.7 Thinking

AnthropicProprietary API

High-end coding when extended reasoning mode is useful.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena94Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

89.9

#3

Model

94

Confidence

69%

4Claude logo

Claude Opus 4.8 Thinking

AnthropicProprietary API

Large code changes that benefit from slower thinking-mode review.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena94Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

89.9

#4

Model

94

Confidence

69%

5Claude logo

Claude Opus 4.7

AnthropicProprietary API

Stable high-end coding when Fable or Opus 4.8 is unavailable.

Caution

Newer Anthropic models usually score better on current leaderboards.

Source coverage5/5

100% confirmed coverage · 100% confidence

Code Arena94Vals SWE-bench Verified86Vals Vibe Code Bench87Vellum92AA Index89

Adjusted score

89.9

#5

Model

89.9

Confidence

100%

6Claude logo

Claude Opus 4.6 Thinking

AnthropicProprietary API

Extended reasoning on large repositories and multi-file refactors.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena93Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

89.2

#6

Model

93

Confidence

69%

7Claude logo

Claude Opus 4.6

AnthropicProprietary API

Premium coding assistance with strong instruction following.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena92Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

88.5

#7

Model

92

Confidence

69%

8Qwen logo

Qwen3.7 Max

AlibabaProprietary API

Cost-aware coding workflows where Alibaba model access is preferred.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena92Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

88.5

#8

Model

92

Confidence

69%

9Z.ai logo

GLM-5.1

Z.aiMIT

Open-weight friendly coding experiments and self-hosted evaluations.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena92Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

88.5

#9

Model

92

Confidence

69%

10Claude logo

Claude Sonnet 4.6

AnthropicProprietary API

Daily coding edits, review, and lower-cost agentic workflows.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena91Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

87.9

#10

Model

91

Confidence

69%

11MiniMax logo

MiniMax M3

MiniMaxProprietary API

Alternative coding assistant testing across web development prompts.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena91Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

87.9

#11

Model

91

Confidence

69%

12Kimi logo

Kimi K2.6

MoonshotModified MIT

Long-context code reading and open-weight oriented comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena91Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

87.9

#12

Model

91

Confidence

69%

13Meta logo

Muse Spark

MetaProprietary API

Experimental web development generation and UI coding prompts.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena91Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

87.9

#13

Model

91

Confidence

69%

14OpenAI logo

GPT-5.5 xhigh

OpenAIProprietary API

Highest-effort OpenAI coding harness runs and difficult repository tasks.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena90Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

87.2

#14

Model

90

Confidence

69%

15OpenAI logo

GPT-5.5

OpenAIProprietary API

Coding plus broad product, data, and tool-use work.

Caution

Use task-specific harnesses for agentic coding; model variants matter.

Source coverage5/5

100% confirmed coverage · 100% confidence

Code Arena87Vals SWE-bench Verified87Vals Vibe Code Bench83Vellum87AA Index92

Adjusted score

86.7

#15

Model

86.7

Confidence

100%

16Claude logo

Claude Opus 4.5 Thinking

AnthropicProprietary API

Careful multi-step coding where thinking-mode behavior is preferred.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena89Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

86.5

#16

Model

89

Confidence

69%

17Qwen logo

Qwen3.6 Max Preview

AlibabaProprietary API

Preview-model coding tests before standardizing on Qwen releases.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena89Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

86.5

#17

Model

89

Confidence

69%

18OpenAI logo

GPT-5.5 High

OpenAIProprietary API

Higher-effort OpenAI coding runs with strong tool-use potential.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena89Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

86.5

#18

Model

89

Confidence

69%

19Xiaomi MiMo logo

Mimo V2.5 Pro

XiaomiMIT

Open-weight coding model tests and lower-cost deployment planning.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena88Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

85.8

#19

Model

88

Confidence

69%

20Claude logo

Claude Opus 4.5

AnthropicProprietary API

High-quality coding and review when newer Opus versions are unavailable.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena88Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

85.8

#20

Model

88

Confidence

69%

21Qwen logo

Qwen3.6 Plus

AlibabaProprietary API

Balanced Qwen coding workloads and product prototyping.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena88Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

85.8

#21

Model

88

Confidence

69%

22DeepSeek logo

DeepSeek V4 Pro Thinking

DeepSeekMIT

Reasoning-heavy coding comparisons with open-weight deployment options.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena88Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

85.8

#22

Model

88

Confidence

69%

23OpenAI logo

GPT-5.4 High

OpenAIProprietary API

High-effort coding assistance and tool-call workflows.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena88Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

85.8

#23

Model

88

Confidence

69%

24Gemini logo

Gemini 3.1 Pro Preview

GoogleProprietary API

Google ecosystem coding tests and preview-model evaluation.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena87Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

85.1

#24

Model

87

Confidence

69%

25Z.ai logo

GLM-4.7

Z.aiMIT

Open-weight coding and local evaluation candidates.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena86Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

84.4

#25

Model

86

Confidence

69%

26Gemini logo

Gemini 3 Pro

GoogleProprietary API

General coding, research, and Google-integrated development workflows.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena86Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

84.4

#26

Model

86

Confidence

69%

27OpenAI logo

GPT-5.4 Medium

OpenAIProprietary API

Balanced OpenAI coding runs where latency and cost matter.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena86Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

84.4

#27

Model

86

Confidence

69%

28Gemini logo

Gemini 3 Flash

GoogleProprietary API

Fast web development assistance and lightweight coding workflows.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena86Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

84.4

#28

Model

86

Confidence

69%

29Xiaomi MiMo logo

Mimo V2.5

XiaomiMIT

Open-weight coding shortlist testing.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena86Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

84.4

#29

Model

86

Confidence

69%

30Z.ai logo

GLM-5

Z.aiMIT

Open-weight coding model evaluation with Z.ai releases.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena86Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

84.4

#30

Model

86

Confidence

69%

31Xiaomi MiMo logo

Mimo V2 Pro

XiaomiProprietary API

Xiaomi coding model comparison for web development tasks.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena86Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

84.4

#31

Model

86

Confidence

69%

32Kimi logo

Kimi K2.5 Thinking

MoonshotModified MIT

Reasoning-forward coding and long-context repository prompts.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena86Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

84.4

#32

Model

86

Confidence

69%

33Gemini logo

Gemini 3.5 Flash

GoogleProprietary API

Fast coding assistance and lower-latency product workflows.

Caution

Check quality on your own repository before using it for autonomous changes.

Source coverage5/5

100% confirmed coverage · 100% confidence

Code Arena90Vals SWE-bench Verified83Vals Vibe Code Bench76Vellum80AA Index88

Adjusted score

84.3

#33

Model

84.3

Confidence

100%

34Kimi logo

Kimi K2.5 Instant

MoonshotModified MIT

Faster Kimi coding runs and draft implementation work.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena85Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

83.7

#34

Model

85

Confidence

69%

35Codex logo

GPT-5.3 Codex

OpenAIProprietary API

Codex-style coding workflows and repository automation.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena85Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

83.7

#35

Model

85

Confidence

69%

36OpenAI logo

GPT-5.2

OpenAIProprietary API

General OpenAI coding support where newer variants are unavailable.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena84Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

83.1

#36

Model

84

Confidence

69%

37OpenAI logo

GPT-5.4 Mini High

OpenAIProprietary API

Smaller high-effort OpenAI coding tasks.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena84Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

83.1

#37

Model

84

Confidence

69%

38MiniMax logo

MiniMax M2.7

MiniMaxModified MIT

Open-ish MiniMax coding model tests and web UI generation.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena84Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

83.1

#38

Model

84

Confidence

69%

39Qwen logo

Qwen3.5 397B A17B

AlibabaApache 2.0

Open-source Qwen coding evaluation and self-hosted experiments.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena84Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

83.1

#39

Model

84

Confidence

69%

40OpenAI logo

GPT-5 Medium

OpenAIProprietary API

Baseline OpenAI coding assistance.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena84Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

83.1

#40

Model

84

Confidence

69%

41OpenAI logo

GPT-5.4

OpenAIProprietary API

General OpenAI coding and product engineering tasks.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena84Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

83.1

#41

Model

84

Confidence

69%

42MiniMax logo

MiniMax M2.1 Preview

MiniMaxMIT

Preview-model web development comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena84Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

83.1

#42

Model

84

Confidence

69%

43OpenAI logo

GPT-5.1 Medium

OpenAIProprietary API

Older OpenAI coding benchmark comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena84Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

83.1

#43

Model

84

Confidence

69%

44Grok logo

Grok 4.20 Beta Reasoning

xAIProprietary API

Reasoning-heavy xAI coding tests and alternative model comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena83Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

82.4

#44

Model

83

Confidence

69%

45Claude logo

Claude Sonnet 4.5 Thinking

AnthropicProprietary API

Thinking-mode coding and careful edit planning.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena83Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

82.4

#45

Model

83

Confidence

69%

46Gemini logo

Gemini 3 Flash Thinking Minimal

GoogleProprietary API

Low-latency Gemini coding with minimal thinking behavior.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena83Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

82.4

#46

Model

83

Confidence

69%

47Claude logo

Claude Opus 4.1

AnthropicProprietary API

Legacy Opus-class coding and review comparisons.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena83Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

82.4

#47

Model

83

Confidence

69%

48Claude logo

Claude Sonnet 4.5

AnthropicProprietary API

Routine coding, bug fixes, and code review.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena83Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

82.4

#48

Model

83

Confidence

69%

49MiniMax logo

MiniMax M2.5

MiniMaxModified MIT

MiniMax coding model evaluation on web tasks.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena83Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

82.4

#49

Model

83

Confidence

69%

50Gemma logo

Gemma 4 31B

GoogleApache 2.0

Open-source local or hosted coding experiments.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena83Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

82.4

#50

Model

83

Confidence

69%

51Codex logo

GPT-5.3 Codex

OpenAIProprietary API

Codex-style coding workflows and repository automation at the lower Arena score band.

Caution

Arena.ai currently lists this Codex harness entry with a separate score band; verify the exact harness setting before comparing.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena82Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

81.7

#51

Model

82

Confidence

69%

52Grok logo

Grok 4.3

xAIProprietary API

xAI coding assistant evaluation.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena82Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

81.7

#52

Model

82

Confidence

69%

53DeepSeek logo

DeepSeek V3.2 Thinking

DeepSeekMIT

Reasoning-oriented open-weight coding tests.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena82Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

81.7

#53

Model

82

Confidence

69%

54Qwen logo

Qwen3.5 122B A10B

AlibabaApache 2.0

Smaller Qwen self-hosted coding experiments.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena82Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

81.7

#54

Model

82

Confidence

69%

55Hunyuan logo

Hunyuan HY3 Preview

TencentTencent Hunyuan Community

Tencent preview-model coding comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena82Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

81.7

#55

Model

82

Confidence

69%

56Gemma logo

Gemma 4 26B A4B

GoogleApache 2.0

Smaller open-source coding and local deployment tests.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena82Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

81.7

#56

Model

82

Confidence

69%

57Qwen logo

Qwen3.5 27B

AlibabaApache 2.0

Lower-resource Qwen coding experiments.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena82Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

81.7

#57

Model

82

Confidence

69%

58Z.ai logo

GLM-4.6

Z.aiMIT

Open-weight coding model comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena81Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

81

#58

Model

81

Confidence

69%

59OpenAI logo

GPT-5.1

OpenAIProprietary API

Legacy OpenAI coding benchmark reference.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena80Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

80.3

#59

Model

80

Confidence

69%

60Xiaomi MiMo logo

Mimo V2 Flash Non-Thinking

XiaomiMIT

Fast open-weight Xiaomi coding tasks.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena80Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

80.3

#60

Model

80

Confidence

69%

61Codex logo

GPT-5.2 Codex

OpenAIProprietary API

Codex-style repository automation on older OpenAI releases.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena80Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

80.3

#61

Model

80

Confidence

69%

62DeepSeek logo

DeepSeek V3.2

DeepSeekMIT

Open-weight coding and local agent comparisons.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena80Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

80.3

#62

Model

80

Confidence

69%

63Kimi logo

Kimi K2 Thinking Turbo

MoonshotModified MIT

Fast reasoning-focused Kimi coding workflows.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena80Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

80.3

#63

Model

80

Confidence

69%

64Codex logo

GPT-5.1 Codex

OpenAIProprietary API

Older Codex-style coding workflows.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena80Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

80.3

#64

Model

80

Confidence

69%

65Claude logo

Claude Haiku 4.5

AnthropicProprietary API

Fast, lower-cost coding assistance and code review drafts.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena80Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

80.3

#65

Model

80

Confidence

69%

66MiniMax logo

MiniMax M2

MiniMaxApache 2.0

Open-source MiniMax coding baseline comparisons.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena78Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

78.9

#66

Model

78

Confidence

69%

67Xiaomi MiMo logo

Mimo V2 Flash Thinking

XiaomiMIT

Fast Xiaomi coding model tests with thinking behavior.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena78Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

78.9

#67

Model

78

Confidence

69%

68DeepSeek logo

DeepSeek V3.2 Exp

DeepSeekMIT

Experimental DeepSeek coding model comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena77Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

78.3

#68

Model

77

Confidence

69%

69Qwen logo

Qwen3 Coder 480B A35B Instruct

AlibabaApache 2.0

Coder-specialized Qwen deployment and benchmark testing.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena77Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

78.3

#69

Model

77

Confidence

69%

70Mistral logo

Mistral Medium 3.5

MistralModified MIT

Mistral coding benchmark comparison and EU-provider shortlist work.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena76Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

77.6

#70

Model

76

Confidence

69%

71KwaiKAT logo

KAT-Coder-Pro-V1

KwaiKATProprietary API

Coder-specialized comparison across web development tasks.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena76Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

77.6

#71

Model

76

Confidence

69%

72Qwen logo

Qwen3.5 35B A3B

AlibabaApache 2.0

Smaller Qwen coding deployments and local evaluation.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena75Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

76.9

#72

Model

75

Confidence

69%

73Gemini logo

Gemini 3.1 Flash Lite Preview

GoogleProprietary API

Low-latency Gemini coding assistant comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena75Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

76.9

#73

Model

75

Confidence

69%

74Arcee AI logo

Trinity Large Thinking

Arcee AIApache 2.0

Open-source thinking-model coding tests.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena75Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

76.9

#74

Model

75

Confidence

69%

75Codex logo

GPT-5.1 Codex Mini

OpenAIProprietary API

Lower-cost Codex-style coding support.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena74Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

76.2

#75

Model

74

Confidence

69%

76Qwen logo

Qwen3.5 Flash

AlibabaProprietary API

Fast Qwen coding assistance and draft implementation.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena74Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

76.2

#76

Model

74

Confidence

69%

77Grok logo

Grok 4.1 Fast Reasoning

xAIProprietary API

Fast xAI reasoning on coding tasks.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena74Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

76.2

#77

Model

74

Confidence

69%

78Mistral logo

Mistral Large 3

MistralApache 2.0

Mistral coding and reasoning comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena73Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

75.5

#78

Model

73

Confidence

69%

79Grok logo

Grok 4.1 Thinking

xAIProprietary API

xAI thinking-mode coding tests.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena73Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

75.5

#79

Model

73

Confidence

69%

80Gemini logo

Gemini 2.5 Pro

GoogleProprietary API

Legacy Gemini coding benchmark reference.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena72Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

74.8

#80

Model

72

Confidence

69%

81IBM Granite logo

Granite 4.1 8B

IBMApache 2.0

Small open-source coding model experiments.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena72Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

74.8

#81

Model

72

Confidence

69%

82Mistral logo

Devstral 2

MistralModified MIT

Developer-focused Mistral coding benchmark comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena72Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

74.8

#82

Model

72

Confidence

69%

83Inception AI logo

Mercury 2

Inception AIProprietary API

Alternative model coding evaluation.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena70Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

73.5

#83

Model

70

Confidence

69%

84Grok logo

Grok 4 Fast Reasoning

xAIProprietary API

Fast Grok reasoning for coding prompts.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena69Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

72.8

#84

Model

69

Confidence

69%

85Grok logo

Grok Code Fast 1

xAIProprietary API

Code-specialized Grok assistant comparison.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena68Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

72.1

#85

Model

68

Confidence

69%

86Mistral logo

Devstral Medium 2507

MistralProprietary API

Lower-ranked Mistral developer-model baseline.

Caution

Only the Arena.ai Code Arena source is confirmed for this row in this snapshot; compare it with your own repository benchmark before adoption.

Source coverage1/5

35% confirmed coverage · 69% confidence

Code Arena66Vals SWE-bench Verifiedn/aVals Vibe Code Benchn/aVellumn/aAA Indexn/a

Adjusted score

70.7

#86

Model

66

Confidence

69%

Decision guide

How to choose from this Best AI for Coding ranking

Snapshot 2026-06-12

Best for

  • Agentic coding workflows that edit files, run tests, and explain diffs.
  • Web development, refactoring, debugging, and framework-specific implementation tasks.
  • Teams comparing coding assistants before choosing an IDE plugin, API, or coding agent.

Evaluate

  • Run each finalist on a private repository test set with real issues and failing tests.
  • Compare diff quality, tool reliability, latency, price, context length, and rollback behavior.
  • Check whether the model follows security constraints and avoids changing unrelated files.

Avoid

  • Letting any model make autonomous production changes without human review.
  • Choosing only by general chat quality when the task is code execution and repository repair.
  • Treating missing leaderboard rows as proof that a model is weak; missing rows mean uncertainty.

Questions

Best AI for Coding FAQ

What is the best AI for coding?

The top row is the blended pick for this snapshot, but the right coding model still depends on your repository, tool access, latency target, and budget.

Why are missing benchmark sources not scored as zero?

A missing source row is uncertainty, not a failed benchmark. The adjusted score blends confirmed quality with a confidence penalty instead of assuming zero performance.

Should the highest coding score be used automatically?

No. Use the ranking as a shortlist, then test the top models on your own repo tasks before standardizing.

Other ranking reports

Method note

Treat the winner as a shortlist, not a final procurement decision

The top model is the best blended pick for this query snapshot, but model choice should still account for price, latency, privacy, context length, tool access, safety settings, and your own benchmark prompts. Use this page to reduce the search space, then run a small evaluation on your real tasks before standardizing. See the methodology and editorial policy for source selection and correction standards.