Official prices plus workload math

Text AI API Pricing: 73 Official Model Rows

Text API costs move when prompts get longer, agents retry, cache hits vary, and output tokens grow. Start with your workload assumptions, then compare official model rows against the model-quality pages.

Price rows

73

Providers

11

Official sources

13

Last updated

2026-07-04T06:36:11.407Z

Last source check: 2026-07-04

What changed in this update

Refreshed 73 official text price rows.

Grouped rows across 11 providers and 13 official source pages.

Kept workload guidance tied to launch checks, real usage units, and official-source verification.

Input and output tokens
Cache hit rate
Batch traffic
Retries or failed calls

What the price row misses

The useful number is the cost of a successful workflow, not the cleanest API row.

AI API pricing pages often look simple because they compare one published row at a time. This page keeps the official row visible, then adds the messy assumptions that show up in products: retries, long outputs, cache misses, batches, and review loops.

Source check

Every listed row should trace back to an official provider pricing, docs, model, or API page before it becomes a comparison row.

Unit check

Rows are kept in their original billing units when conversion would hide an important difference, such as per-second video or per-image generation.

Workload check

The calculator starts from product behavior: retries, cache hits, long prompts, output length, batch jobs, and rejected generations.

Launch check

Before a production rollout, reopen the official source because provider prices, cache rules, model names, and eligibility can change quickly.

Pricing validation playbook

Validate the bill with your product workflow before choosing a provider.

Official rows are the starting point. The production decision comes from measuring the unit your users actually complete, the retries they create, and the quality gates you need before an output is accepted.

Define the unit

Cost per successful user action

Decide whether the product unit is one answer, one resolved ticket, one accepted code fix, one processed row, or one completed agent task.

Instrument tokens

Measure input, output, cache, and retries

Log token use before launch in a small pilot. Separate repeated prefixes, retrieved context, generated output, failed calls, and tool loops.

Compare finalists

Test cheap, strong, and middle options

Run the same workload assumptions through at least three candidate rows instead of choosing the lowest published input price.

Review after launch

Replace estimates with logs

After traffic starts, update the calculator with real token distributions, quality failure rate, latency needs, and cache hit rate.

Workload cheat sheet

Start by naming the unit your product really pays for.

The same API price row means different things for chat, RAG, agents, coding, and batch jobs. Before comparing providers, decide what a successful unit is and measure that unit instead of a single clean API call.

Workload

Chat or customer support

What moves the bill

Output length, retries, and conversation history.

Measure first

Average tokens per resolved conversation.

Workload

RAG or document Q&A

What moves the bill

Retrieved context, repeated system prompts, and cache hit rate.

Measure first

Input tokens per answered question.

Workload

Agent or tool workflow

What moves the bill

Planning calls, tool calls, retries, critique, and final answer generation.

Measure first

Model calls per successful user action.

Workload

Coding assistant

What moves the bill

Long prompts, repository context, generated diffs, and review loops.

Measure first

Tokens per accepted fix or merged task.

Workload

Batch enrichment

What moves the bill

Volume, batch eligibility, failed rows, and delayed processing tolerance.

Measure first

Rows processed per month and allowed latency.

Starting assumptions

Start with a believable workload, then replace it with your own logs.

These are not universal benchmarks. They are practical starting points for the calculator when you have not instrumented production yet. After launch, replace them with measured tokens per successful user action.

Support chat

20K monthly runs, 1.2K input tokens, 500 output tokens, 5% retries.

Use this when one customer question usually turns into one answer, with a short history and a modest retry buffer.

RAG question answering

50K monthly runs, 4K input tokens, 350 output tokens, 30% cache hit rate.

Use this when retrieved context is the main cost driver and answers are short, but repeated prompts or wrappers can be cached.

Agent workflow

10K monthly runs, 6K input tokens, 1.5K output tokens, 25% retries.

Use this when one user click can create planning, tool calls, reflection, and a final response rather than one clean call.

Batch extraction

250K monthly runs, 1K input tokens, 150 output tokens, 80% batch traffic.

Use this for offline enrichment, classification, or structured extraction where delayed processing is acceptable.

One workflow example

A cheap model row can lose once you count the whole user action.

The common mistake is comparing one clean API call when the product actually needs several calls to finish the job. The calculator should be driven by the unit your users experience.

Clean spreadsheet row

1 API call

A quick comparison usually assumes one prompt, one answer, and no retry. That is useful for a first scan, but it is not how most products behave.

Agent action

2-5 model calls

A coding, support, or agent workflow can include planning, tool calls, critique, retries, and a final response before the user sees success.

Cost unit to compare

Successful action

Compare providers by the cost of one completed user action, accepted fix, resolved ticket, or processed row rather than the cheapest single call.

Cost audit before launch

Questions to answer before calling a model cheap

  • What is one successful user action in this product?
  • How many model calls does that action usually require?
  • How many input tokens are repeated and cacheable?
  • How long are the outputs after retries and tool traces?
  • Which traffic can use batch pricing without hurting latency?
  • What quality failure rate forces a second model call or human review?
Official API price calculator

Text APIs

Chat, reasoning, coding, agents, RAG, extraction, and long-context text workloads. Use this for chat, coding agents, RAG, extraction, and reasoning features.

Workload assumptions

Set expected usage; each row estimates monthly cost from official unit prices.

%
%
%
Official price table - Text composite score order

73 official USD rows - checked 2026-07-04 - sorted by Coding + Writing + Math score.

Daily source checks
ModelProviderInputper 1M tokensCacheper 1M tokensOutputper 1M tokensRegionSourceNotes

Anthropic

Official API

$10.00

per 1M

$1.00

per 1M

$50.00

per 1M

$58.27

Global defaultClaude API pricing

Checked 2026-07-04

5-minute cache writes are $12.50/MTok and 1-hour cache writes are $20/MTok.

Anthropic

Official API

$5.00

per 1M

$0.50

per 1M

$25.00

per 1M

$29.14

Global defaultClaude API pricing

Checked 2026-07-04

Fast mode and data residency are priced separately.

Anthropic

Official API

$5.00

per 1M

$0.50

per 1M

$25.00

per 1M

$29.14

Global defaultClaude API pricing

Checked 2026-07-04

Opus 4.7 and later use a newer tokenizer; token counts can differ from older models.

Anthropic

Official API

$5.00

per 1M

$0.50

per 1M

$25.00

per 1M

$29.14

Global defaultClaude API pricing

Checked 2026-07-04

Fast mode has separate premium pricing; US-only inference geography can add a 1.1x multiplier.

OpenAI

Official API

$5.00

per 1M

$0.50

per 1M

$30.00

per 1M

$32.81

GlobalOpenAI API pricing

Checked 2026-07-04

OpenAI lists separate Standard, Batch, Flex, and Priority tiers; this row uses Standard short-context pricing.

OpenAI

Official API

$2.50

per 1M

$0.25

per 1M

$15.00

per 1M

$16.41

GlobalOpenAI API pricing

Checked 2026-07-04

Regional data residency endpoints may add a 10% uplift for eligible newer models.

Google

Official API

$2.00

per 1M

$0.20

per 1M

$12.00

per 1M

$13.13

Gemini API paid tierGemini API pricing

Checked 2026-07-04

Prompts above 200K tokens have higher input, output, and cache prices.

Google

Official API

$1.50

per 1M

$0.15

per 1M

$9.00

per 1M

$9.84

Gemini API paid tierGemini API pricing

Checked 2026-07-04

Search and Maps grounding have separate charges after free quotas.

Google

Official API

$0.50

per 1M

$0.05

per 1M

$3.00

per 1M

$3.28

Gemini API paid tierGemini API pricing

Checked 2026-07-04

Preview models can change before becoming stable.

Z.AI

Official API

$1.40

per 1M

$0.26

per 1M

$4.40

per 1M

$6.31

Z.AI APIZ.AI pricing

Checked 2026-07-04

Cached input storage is listed as limited-time free on the source page.

Z.AI

Official API

$1.00

per 1M

$0.20

per 1M

$3.20

per 1M

$4.56

Z.AI APIZ.AI pricing

Checked 2026-07-04

Cached input storage is listed as limited-time free.

Anthropic

Official API

$5.00

per 1M

$0.50

per 1M

$25.00

per 1M

$29.14

Global defaultClaude API pricing

Checked 2026-07-04

Batch API pricing is 50% lower for input and output tokens.

Moonshot AI

Official API

$0.95

per 1M

$0.16

per 1M

$4.00

per 1M

$5.02

Kimi APIKimi K2.6 pricing

Checked 2026-07-04

Kimi pricing page lists cache-hit, input, and output prices.

MiniMax

Official API

$0.30

per 1M

$0.06

per 1M

$1.20

per 1M

$1.54

MiniMax APIMiniMax pay-as-you-go pricing

Checked 2026-07-04

Official page marks this as permanent 50% off compared with crossed-out list price.

MiniMax

Official API

$0.60

per 1M

$0.12

per 1M

$2.40

per 1M

$3.09

MiniMax APIMiniMax pay-as-you-go pricing

Checked 2026-07-04

Input tokens above 512K may have availability limits.

Moonshot AI

Official API

$0.60

per 1M

$0.10

per 1M

$3.00

per 1M

$3.52

Kimi APIKimi K2.5 pricing

Checked 2026-07-04

Supports text, image, video input, thinking and non-thinking modes.

OpenAI

Official API

$30.00

per 1M

n/a

per 1M

$180

per 1M

$211

GlobalOpenAI API pricing

Checked 2026-07-04

Long-context standard pricing is higher; Batch is discounted where available.

OpenAI

Official API

$1.75

per 1M

$0.175

per 1M

$14.00

per 1M

$14.06

GlobalOpenAI API pricing

Checked 2026-07-04

Priority pricing is listed separately on OpenAI pricing.

Anthropic

Official API

$3.00

per 1M

$0.30

per 1M

$15.00

per 1M

$17.48

Global defaultClaude API pricing

Checked 2026-07-04

US-only inference geography adds a 1.1x multiplier for Sonnet 4.6 and later models.

Z.AI

Official API

$0.60

per 1M

$0.11

per 1M

$2.20

per 1M

$2.93

Z.AI APIZ.AI pricing

Checked 2026-07-04

Cached input storage is listed as limited-time free.

Z.AI

Official API

$0.07

per 1M

$0.01

per 1M

$0.40

per 1M

$0.4463

Z.AI APIZ.AI pricing

Checked 2026-07-04

Good row for budget-sensitive routing comparisons.

Z.AI

Official API

$0.00

per 1M

$0.00

per 1M

$0.00

per 1M

$0.00

Z.AI APIZ.AI pricing

Checked 2026-07-04

Displayed as Free in the official pricing table; availability can change.

Z.AI

Official API

$1.20

per 1M

$0.24

per 1M

$4.00

per 1M

$5.59

Z.AI APIZ.AI pricing

Checked 2026-07-04

Prices are per 1M tokens.

OpenAI

Official API

$0.75

per 1M

$0.075

per 1M

$4.50

per 1M

$4.92

GlobalOpenAI API pricing

Checked 2026-07-04

Batch pricing is 50% lower in the official table.

OpenAI

Official API

$0.20

per 1M

$0.02

per 1M

$1.25

per 1M

$1.35

GlobalOpenAI API pricing

Checked 2026-07-04

Good reference point for lightweight classification, extraction, and chat routing.

OpenAI

Official API

$30.00

per 1M

n/a

per 1M

$180

per 1M

$211

GlobalOpenAI API pricing

Checked 2026-07-04

No cached-input price is listed for this pro row in the flagship pricing table.

MiniMax

Official API

$0.30

per 1M

$0.06

per 1M

$1.20

per 1M

$1.54

MiniMax APIMiniMax pay-as-you-go pricing

Checked 2026-07-04

Prompt cache read and write are separate pricing fields.

MiniMax

Official API

$0.60

per 1M

$0.06

per 1M

$2.40

per 1M

$3.06

MiniMax APIMiniMax pay-as-you-go pricing

Checked 2026-07-04

High-speed tier doubles input and output price compared with standard.

Alibaba Cloud

Official API

$0.172

per 1M

n/a

per 1M

$1.03

per 1M

$1.21

Global deploymentAlibaba Cloud Model Studio pricing

Checked 2026-07-04

Longer prompt tiers are priced higher in the official table.

Anthropic

Official API

$3.00

per 1M

$0.30

per 1M

$15.00

per 1M

$17.48

Global defaultClaude API pricing

Checked 2026-07-04

Prompt cache writes are listed separately from cache hits.

MiniMax

Official API

$0.30

per 1M

$0.03

per 1M

$1.20

per 1M

$1.53

MiniMax APIMiniMax pay-as-you-go pricing

Checked 2026-07-04

Legacy model retained for users comparing older production integrations.

MiniMax

Official API

$0.60

per 1M

$0.03

per 1M

$2.40

per 1M

$3.04

MiniMax APIMiniMax pay-as-you-go pricing

Checked 2026-07-04

High-speed tier doubles input and output price compared with standard.

xAI

Official API

$1.25

per 1M

$0.20

per 1M

$2.50

per 1M

$4.57

xAI APIxAI API pricing

Checked 2026-07-04

Server-side tools are charged separately from token usage.

Alibaba Cloud

Official API

$0.115

per 1M

n/a

per 1M

$0.917

per 1M

$0.9759

Global deploymentAlibaba Cloud Model Studio pricing

Checked 2026-07-04

Global endpoint and storage are in US Virginia or Germany Frankfurt.

Z.AI

Official API

$0.60

per 1M

$0.11

per 1M

$2.20

per 1M

$2.93

Z.AI APIZ.AI pricing

Checked 2026-07-04

Cached input storage is listed as limited-time free.

Anthropic

Official API

$1.00

per 1M

$0.10

per 1M

$5.00

per 1M

$5.83

Global defaultClaude API pricing

Checked 2026-07-04

Useful Claude baseline for high-volume support and extraction workloads.

Mistral

Official API

$1.50

per 1M

n/a

per 1M

$7.50

per 1M

$9.45

Mistral APIMistral pricing

Checked 2026-07-04

Batch processing gets a 50% discount.

Google

Official API

$0.25

per 1M

$0.025

per 1M

$1.50

per 1M

$1.64

Gemini API paid tierGemini API pricing

Checked 2026-07-04

Audio input is priced separately on Google pricing.

Mistral

Official API

$0.50

per 1M

n/a

per 1M

$1.50

per 1M

$2.42

Mistral APIMistral pricing

Checked 2026-07-04

Mistral pricing is per million input and output tokens.

Google

Official API

$1.25

per 1M

$0.125

per 1M

$10.00

per 1M

$10.04

Gemini API paid tierGemini API pricing

Checked 2026-07-04

Prompts above 200K tokens have higher prices.

OpenAI logoOfficial price row

OpenAI

Official API

$5.00

per 1M

$0.50

per 1M

$30.00

per 1M

$32.81

GlobalOpenAI API pricing

Checked 2026-07-04

Useful when comparing ChatGPT chat-latest against flagship model APIs.

Claude logoOfficial price row

Anthropic

Official API

$10.00

per 1M

$1.00

per 1M

$50.00

per 1M

$58.27

Limited availabilityClaude API pricing

Checked 2026-07-04

Official table marks Mythos 5 as limited availability.

Gemini logoOfficial price row

Google

Official API

$0.30

per 1M

$0.03

per 1M

$2.50

per 1M

$2.48

Gemini API paid tierGemini API pricing

Checked 2026-07-04

Grounding and Maps pricing are separate from token billing.

Grok logoOfficial price row

xAI

Official API

$1.00

per 1M

$0.20

per 1M

$2.00

per 1M

$3.68

xAI APIxAI API pricing

Checked 2026-07-04

Currently described by xAI as early access.

Grok logoOfficial price row

xAI

Official API

$1.25

per 1M

$0.20

per 1M

$2.50

per 1M

$4.57

xAI APIxAI API pricing

Checked 2026-07-04

Listed in xAI Chat API pricing with the same token rates as grok-4.3.

Grok logoOfficial price row

xAI

Official API

$1.25

per 1M

$0.20

per 1M

$2.50

per 1M

$4.57

xAI APIxAI API pricing

Checked 2026-07-04

Reasoning tokens are billed under the model token rates.

Grok logoOfficial price row

xAI

Official API

$1.25

per 1M

$0.20

per 1M

$2.50

per 1M

$4.57

xAI APIxAI API pricing

Checked 2026-07-04

Batch pricing can vary by model detail page.

Mistral logoOfficial price row

Mistral

Official API

$0.10

per 1M

n/a

per 1M

$0.30

per 1M

$0.483

Mistral APIMistral pricing

Checked 2026-07-04

Open model listed on Mistral pricing.

Mistral logoOfficial price row

Mistral

Official API

$2.00

per 1M

n/a

per 1M

$5.00

per 1M

$8.93

Mistral APIMistral pricing

Checked 2026-07-04

Use for reasoning comparisons against general-purpose models.

Mistral logoOfficial price row

Mistral

Official API

$0.50

per 1M

n/a

per 1M

$1.50

per 1M

$2.42

Mistral APIMistral pricing

Checked 2026-07-04

Batch processing gets a 50% discount.

Mistral logoOfficial price row

Mistral

Official API

$0.10

per 1M

n/a

per 1M

$0.10

per 1M

$0.336

Mistral APIMistral pricing

Checked 2026-07-04

Best for low-cost routing and lightweight agent steps.

Mistral logoOfficial price row

Mistral

Official API

$0.15

per 1M

n/a

per 1M

$0.15

per 1M

$0.504

Mistral APIMistral pricing

Checked 2026-07-04

Low-cost open model in Mistral pricing.

COfficial price row

Cohere

Official API

$0.50

per 1M

n/a

per 1M

$1.50

per 1M

$2.42

Cohere APICohere pricing

Checked 2026-07-04

Cohere lists Aya Expanse API pricing in the official pricing FAQ.

COfficial price row

Cohere

Official API

$1.00

per 1M

n/a

per 1M

$2.00

per 1M

$4.10

Existing Cohere customersCohere pricing

Checked 2026-07-04

Cohere marks these as legacy model prices for existing customers.

COfficial price row

Cohere

Official API

$0.30

per 1M

n/a

per 1M

$0.60

per 1M

$1.23

Existing Cohere customersCohere pricing

Checked 2026-07-04

Listed in Cohere pricing FAQ as legacy pricing.

COfficial price row

Cohere

Official API

$0.50

per 1M

n/a

per 1M

$1.50

per 1M

$2.42

Existing Cohere customersCohere pricing

Checked 2026-07-04

Listed in Cohere pricing FAQ as legacy pricing.

COfficial price row

Cohere

Official API

$3.00

per 1M

n/a

per 1M

$15.00

per 1M

$18.90

Existing Cohere customersCohere pricing

Checked 2026-07-04

Listed in Cohere pricing FAQ as legacy pricing.

COfficial price row

Cohere

Official API

$2.50

per 1M

n/a

per 1M

$10.00

per 1M

$13.91

Existing Cohere customersCohere pricing

Checked 2026-07-04

Listed in Cohere pricing FAQ as legacy pricing.

DeepSeek logoOfficial price row

DeepSeek

Official API

$0.27

per 1M

$0.07

per 1M

$1.10

per 1M

$1.41

DeepSeek APIDeepSeek USD pricing

Checked 2026-07-04

Automatic context caching uses cache-hit and cache-miss input prices.

DeepSeek logoOfficial price row

DeepSeek

Official API

$0.55

per 1M

$0.14

per 1M

$2.19

per 1M

$2.84

DeepSeek APIDeepSeek USD pricing

Checked 2026-07-04

Reasoning output and CoT behavior can change real workload cost.

Qwen logoOfficial price row

Alibaba Cloud

Official API

$0.359

per 1M

n/a

per 1M

$1.43

per 1M

$2.00

Global deploymentAlibaba Cloud Model Studio pricing

Checked 2026-07-04

Higher prompt-size tiers increase input and output prices.

Qwen logoOfficial price row

Alibaba Cloud

Official API

$0.30

per 1M

n/a

per 1M

$1.50

per 1M

$1.89

EU deploymentAlibaba Cloud Model Studio pricing

Checked 2026-07-04

Higher prompt tiers are listed separately up to 256K tokens.

Qwen logoOfficial price row

Alibaba Cloud

Official API

$0.80

per 1M

n/a

per 1M

$2.40

per 1M

$3.86

International deploymentAlibaba Cloud Model Studio pricing

Checked 2026-07-04

International inference is dynamically scheduled globally excluding Chinese Mainland.

Qwen logoOfficial price row

Alibaba Cloud

Official API

$0.287

per 1M

n/a

per 1M

$0.861

per 1M

$1.39

Model StudioAlibaba Cloud Model Studio pricing

Checked 2026-07-04

Listed under QwQ open source model pricing.

Qwen logoOfficial price row

Alibaba Cloud

Official API

$0.072

per 1M

n/a

per 1M

$0.287

per 1M

$0.3999

Chinese Mainland deploymentAlibaba Cloud Model Studio pricing

Checked 2026-07-04

Chinese Mainland deployment only according to official table.

Qwen logoOfficial price row

Alibaba Cloud

Official API

$0.574

per 1M

n/a

per 1M

$1.72

per 1M

$2.77

Chinese Mainland deploymentAlibaba Cloud Model Studio pricing

Checked 2026-07-04

Useful for math-specific API cost comparisons.

Kimi logoOfficial price row

Moonshot AI

Official API

$0.95

per 1M

$0.19

per 1M

$4.00

per 1M

$5.03

Kimi APIKimi K2.7 Code pricing

Checked 2026-07-04

Limited-time promotion is mentioned on the official page.

Z.AI logoOfficial price row

Z.AI

Official API

$0.60

per 1M

$0.11

per 1M

$2.20

per 1M

$2.93

Z.AI APIZ.AI pricing

Checked 2026-07-04

Cached input storage is listed as limited-time free.

Z.AI logoOfficial price row

Z.AI

Official API

$2.20

per 1M

$0.45

per 1M

$8.90

per 1M

$11.40

Z.AI APIZ.AI pricing

Checked 2026-07-04

Prices are per 1M tokens.

Z.AI logoOfficial price row

Z.AI

Official API

$0.20

per 1M

$0.03

per 1M

$1.10

per 1M

$1.24

Z.AI APIZ.AI pricing

Checked 2026-07-04

Cached input storage is listed as limited-time free.

Z.AI logoOfficial price row

Z.AI

Official API

$1.10

per 1M

$0.22

per 1M

$4.50

per 1M

$5.73

Z.AI APIZ.AI pricing

Checked 2026-07-04

Prices are per 1M tokens.

Z.AI logoOfficial price row

Z.AI

Official API

$0.10

per 1M

n/a

per 1M

$0.10

per 1M

$0.336

Z.AI APIZ.AI pricing

Checked 2026-07-04

No cached-input price is listed for this row.

Z.AI logoOfficial price row

Z.AI

Official API

$0.00

per 1M

$0.00

per 1M

$0.00

per 1M

$0.00

Z.AI APIZ.AI pricing

Checked 2026-07-04

Displayed as Free in the official pricing table; availability can change.

Production cost traps

The cheapest model on paper can be the wrong production default.

Most app flows rarely send one clean prompt and one neat answer. They retry, call tools, reuse some context, miss cache sometimes, and produce outputs that are longer than the demo. Use this section to sanity-check the estimate before trusting the table.

Output tokens quietly decide the bill

A model with cheap input can still become expensive if it writes long answers, verbose tool traces, or multi-step reasoning for every request.

Cache savings need repeatable prompts

Prompt caching helps most when system prompts, retrieval wrappers, or conversation prefixes repeat. It helps much less when every request is unique.

Agents turn one click into many calls

A single user action can become planning, search, tool calls, retries, critique, and final rewrite. Count the whole action, not only the final answer.

Batch discounts are not live traffic

Batch pricing can be great for offline jobs, enrichment, and nightly processing, but it should not be used as the default cost for a realtime user flow.

Comparison workflow

Compare scenarios, not just models.

A better pricing decision usually comes from testing the same workload against three rows: the cheapest plausible option, the model you actually want, and one balanced middle choice.

1Estimate tokens per successful user action, not just per API call.
2Add retries, tool calls, failed generations, cache misses, and longer-than-expected outputs.
3Run the same workload assumptions across the cheapest row, the quality leader, and one middle option.
4Open the official source before launch because price rows, cache rules, and batch eligibility can change.

Text API pricing FAQ

Why is the cheapest text API row not always cheapest in production?

Because production usage includes retries, output length, cache misses, tool calls, failed generations, and sometimes multiple model calls per user action. The row with the lowest input price can lose once the full workflow is counted.

Should I compare input price or output price first?

Compare both, but start with the side your app uses most. RAG and extraction can be input-heavy, while writing, coding, agents, and support replies can become output-heavy.

When does prompt caching matter for API pricing?

Prompt caching matters when a large prefix repeats across many requests, such as a stable system prompt, long policy, shared tool schema, or repeated retrieval wrapper. It matters less for one-off prompts.

Is batch pricing safe to use for a live app estimate?

Usually no. Batch pricing is best for offline or delayed jobs. For live chat, agents, customer support, and interactive product flows, estimate with realtime prices unless the provider explicitly supports your latency needs.

How do retries affect text model API cost?

Retries multiply both input and output usage. For agents, coding, extraction, and support flows, estimate cost per successful user action instead of cost per first API call.

Which text model price should I compare first?

Start with the models that pass your quality bar, then compare input, output, cache, batch, and retry assumptions. A cheap row that fails more often is rarely the cheapest production choice.