Weighted ranking report

Best AI for Writing

Creative writing, everyday prose, brand voice, emails, long-form drafts, and tone control. This report blends public leaderboard signals into one task-specific composite score, then highlights the practical cautions that matter before you choose a model.

Current winner

Claude Fable 5

Adjusted score

97.4

Snapshot

2026-06-12

Top contenders

The leading five models in this composite

20 total models
1Claude logo

Claude Fable 5

Model 97.4 · Confidence 100%

97.4

2Gemini logo

Gemini 3 Pro

Model 97.1 · Confidence 100%

97.1

3Claude logo

Claude Opus 4.7 Thinking

Model 99 · Confidence 79%

96.7

4Claude logo

Claude Opus 4.6 Thinking

Model 96.2 · Confidence 100%

96.2

5Claude logo

Claude Opus 4.7

Model 98.3 · Confidence 79%

96.1

Full-source models

5

Average coverage

58%

Top-five spread

1.3

All ranked models

Complete composite model ranking

Showing all 20 models with at least one confirmed source row in this category. Models with no category source coverage are excluded. Confirmed rows are ordered by Bayesian-smoothed adjusted score; missing source rows stay n/a instead of counting as zero.

1Claude logo

Claude Fable 5

AnthropicProprietary API

High-end prose, nuanced rewriting, and difficult creative constraints.

Caution

Premium access may be the limiting factor.

Source coverage4/4

100% confirmed coverage · 100% confidence

Creative Writing100Surge AI Hemingway-bench92EQ-Bench Creative Writing v397Text Overall100

Adjusted score

97.4

#1

Model

97.4

Confidence

100%

2Gemini logo

Gemini 3 Pro

GoogleProprietary API and apps

Research-informed writing, structured drafts, and Google ecosystem workflows.

Caution

Keep human editing for voice, claims, and citations.

Source coverage4/4

100% confirmed coverage · 100% confidence

Creative Writing99Surge AI Hemingway-bench99EQ-Bench Creative Writing v390Text Overall98

Adjusted score

97.1

#2

Model

97.1

Confidence

100%

3Claude logo

Claude Opus 4.7 Thinking

AnthropicProprietary API

Long-form creative drafting where extended reasoning and voice control matter.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage2/4

55% confirmed coverage · 79% confidence

Creative Writing99Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall99

Adjusted score

96.7

#3

Model

99

Confidence

79%

4Claude logo

Claude Opus 4.6 Thinking

AnthropicProprietary API

Long-form writing, editing, and careful instruction following.

Caution

May be overkill for short social or marketing copy.

Source coverage4/4

100% confirmed coverage · 100% confidence

Creative Writing99Surge AI Hemingway-bench91EQ-Bench Creative Writing v395Text Overall99

Adjusted score

96.2

#4

Model

96.2

Confidence

100%

5Claude logo

Claude Opus 4.7

AnthropicProprietary API

Polished prose, rewrites, and editorial review with strong preference ranking coverage.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage2/4

55% confirmed coverage · 79% confidence

Creative Writing98Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall99

Adjusted score

96.1

#5

Model

98.3

Confidence

79%

6Claude logo

Claude Opus 4.6

AnthropicProprietary API

Reliable daily writing, rewriting, and tone preservation.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage2/4

55% confirmed coverage · 79% confidence

Creative Writing98Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall99

Adjusted score

96.1

#6

Model

98.3

Confidence

79%

7Claude logo

Claude Opus 4.8 Thinking

AnthropicProprietary API

High-end writing tasks that benefit from slower thinking-mode revisions.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage2/4

55% confirmed coverage · 79% confidence

Creative Writing98Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

95.9

#7

Model

98

Confidence

79%

8Claude logo

Claude Opus 4.5

AnthropicProprietary API

Natural voice and human-like editing style.

Caution

Older than the newest leaderboard leaders.

Source coverage4/4

100% confirmed coverage · 100% confidence

Creative Writing96Surge AI Hemingway-bench97EQ-Bench Creative Writing v394Text Overall95

Adjusted score

95.7

#8

Model

95.7

Confidence

100%

9Gemini logo

Gemini 3.5 Flash

GoogleProprietary API and apps

Fast writing iterations, content operations, and Google ecosystem workflows.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage2/4

55% confirmed coverage · 79% confidence

Creative Writing97Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

95.3

#9

Model

97.3

Confidence

79%

10Meta logo

Muse Spark

MetaProprietary API

Experimental creative writing and brand-voice generation comparisons.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage2/4

55% confirmed coverage · 79% confidence

Creative Writing97Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

95.3

#10

Model

97.3

Confidence

79%

11Z.ai logo

GLM-5.1

Z.aiMIT

Open-weight oriented writing tests and lower-cost content workflows.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage2/4

55% confirmed coverage · 79% confidence

Creative Writing97Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

95.3

#11

Model

97.3

Confidence

79%

12Grok logo

Grok 4.20 Beta

xAIProprietary API

Alternative writing assistant testing with strong broad text Arena coverage.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage2/4

55% confirmed coverage · 79% confidence

Creative Writing97Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

95.3

#12

Model

97.3

Confidence

79%

13Gemini logo

Gemini 3 Flash

GoogleProprietary API and apps

Lower-latency drafts, social copy, and high-volume editing loops.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage2/4

55% confirmed coverage · 79% confidence

Creative Writing97Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

95.3

#13

Model

97.3

Confidence

79%

14Claude logo

Claude Opus 4.8

AnthropicProprietary API

Premium writing and editing when the latest thinking variant is not needed.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage2/4

55% confirmed coverage · 79% confidence

Creative Writing97Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

95.3

#14

Model

97.3

Confidence

79%

15Gemini logo

Gemini 3.1 Pro Preview

GoogleProprietary API and apps

Writing that needs broad context, outlines, and multimodal references.

Caution

Preview models can change; re-check before standardizing.

Source coverage4/4

100% confirmed coverage · 100% confidence

Creative Writing98Surge AI Hemingway-bench94EQ-Bench Creative Writing v389Text Overall98

Adjusted score

95.2

#15

Model

95.2

Confidence

100%

16OpenAI logo

GPT-5.5 High

OpenAIProprietary API

High-effort OpenAI writing workflows with broad text preference coverage.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage1/4

15% confirmed coverage · 63% confidence

Creative Writingn/aSurge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

94.7

#16

Model

98

Confidence

63%

17OpenAI logo

GPT-5.4 High

OpenAIProprietary API

OpenAI writing and editing workflows where broad text preference is the main signal.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage1/4

15% confirmed coverage · 63% confidence

Creative Writingn/aSurge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

94.7

#17

Model

98

Confidence

63%

18OpenAI logo

GPT-5.2

OpenAIProprietary API and apps

General writing drafts, outlines, and practical rewrite workflows.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage1/4

15% confirmed coverage · 63% confidence

Creative Writingn/aSurge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

94.7

#18

Model

98

Confidence

63%

19Qwen logo

Qwen3.7 Max Preview

AlibabaProprietary API

Qwen writing tests and cost-aware multilingual content workflows.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage1/4

15% confirmed coverage · 63% confidence

Creative Writingn/aSurge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overall98

Adjusted score

94.7

#19

Model

98

Confidence

63%

20Claude logo

Claude Opus 4.5 Thinking

AnthropicProprietary API

Careful long drafts and editing passes when thinking-mode behavior is preferred.

Caution

Arena writing preference data is confirmed for this row; still test voice, factual claims, and brand fit before adopting it.

Source coverage1/4

40% confirmed coverage · 71% confidence

Creative Writing97Surge AI Hemingway-benchn/aEQ-Bench Creative Writing v3n/aText Overalln/a

Adjusted score

94.1

#20

Model

97

Confidence

71%

How to use it

Treat the winner as a shortlist, not a final procurement decision

The top model is the best blended pick for this query snapshot, but model choice should still account for price, latency, privacy, context length, tool access, safety settings, and your own benchmark prompts. Use this page to reduce the search space, then run a small evaluation on your real tasks before standardizing.

Other ranking reports