Official prices plus workload math

Text AI API Pricing: 73 Official Model Rows

Text API costs move when prompts get longer, agents retry, cache hits vary, and output tokens grow. Start with your workload assumptions, then compare official model rows against the model-quality pages.

Price rows

73

Providers

11

Official sources

13

Last updated

2026-07-04T06:36:11.407Z

Last source check: 2026-07-04

What changed in this update

Refreshed 73 official text price rows.

Grouped rows across 11 providers and 13 official source pages.

Kept workload guidance tied to launch checks, real usage units, and official-source verification.

Input and output tokens

Cache hit rate

Batch traffic

Retries or failed calls

What the price row misses

The useful number is the cost of a successful workflow, not the cleanest API row.

AI API pricing pages often look simple because they compare one published row at a time. This page keeps the official row visible, then adds the messy assumptions that show up in products: retries, long outputs, cache misses, batches, and review loops.

Source check

Every listed row should trace back to an official provider pricing, docs, model, or API page before it becomes a comparison row.

Unit check

Rows are kept in their original billing units when conversion would hide an important difference, such as per-second video or per-image generation.

Workload check

The calculator starts from product behavior: retries, cache hits, long prompts, output length, batch jobs, and rejected generations.

Launch check

Before a production rollout, reopen the official source because provider prices, cache rules, model names, and eligibility can change quickly.

Pricing validation playbook

Validate the bill with your product workflow before choosing a provider.

Official rows are the starting point. The production decision comes from measuring the unit your users actually complete, the retries they create, and the quality gates you need before an output is accepted.

Define the unit

Cost per successful user action

Decide whether the product unit is one answer, one resolved ticket, one accepted code fix, one processed row, or one completed agent task.

Instrument tokens

Measure input, output, cache, and retries

Log token use before launch in a small pilot. Separate repeated prefixes, retrieved context, generated output, failed calls, and tool loops.

Compare finalists

Test cheap, strong, and middle options

Run the same workload assumptions through at least three candidate rows instead of choosing the lowest published input price.

Review after launch

Replace estimates with logs

After traffic starts, update the calculator with real token distributions, quality failure rate, latency needs, and cache hit rate.

Workload cheat sheet

Start by naming the unit your product really pays for.

The same API price row means different things for chat, RAG, agents, coding, and batch jobs. Before comparing providers, decide what a successful unit is and measure that unit instead of a single clean API call.

Workload

Chat or customer support

What moves the bill

Output length, retries, and conversation history.

Measure first

Average tokens per resolved conversation.

Workload

RAG or document Q&A

What moves the bill

Retrieved context, repeated system prompts, and cache hit rate.

Measure first

Input tokens per answered question.

Workload

Agent or tool workflow

What moves the bill

Planning calls, tool calls, retries, critique, and final answer generation.

Measure first

Model calls per successful user action.

Workload

Coding assistant

What moves the bill

Long prompts, repository context, generated diffs, and review loops.

Measure first

Tokens per accepted fix or merged task.

Workload

Batch enrichment

What moves the bill

Volume, batch eligibility, failed rows, and delayed processing tolerance.

Measure first

Rows processed per month and allowed latency.

Starting assumptions

Start with a believable workload, then replace it with your own logs.

These are not universal benchmarks. They are practical starting points for the calculator when you have not instrumented production yet. After launch, replace them with measured tokens per successful user action.

Support chat

20K monthly runs, 1.2K input tokens, 500 output tokens, 5% retries.

Use this when one customer question usually turns into one answer, with a short history and a modest retry buffer.

RAG question answering

50K monthly runs, 4K input tokens, 350 output tokens, 30% cache hit rate.

Use this when retrieved context is the main cost driver and answers are short, but repeated prompts or wrappers can be cached.

Agent workflow

10K monthly runs, 6K input tokens, 1.5K output tokens, 25% retries.

Use this when one user click can create planning, tool calls, reflection, and a final response rather than one clean call.

Batch extraction

250K monthly runs, 1K input tokens, 150 output tokens, 80% batch traffic.

Use this for offline enrichment, classification, or structured extraction where delayed processing is acceptable.

One workflow example

A cheap model row can lose once you count the whole user action.

The common mistake is comparing one clean API call when the product actually needs several calls to finish the job. The calculator should be driven by the unit your users experience.

Clean spreadsheet row

1 API call

A quick comparison usually assumes one prompt, one answer, and no retry. That is useful for a first scan, but it is not how most products behave.

Agent action

2-5 model calls

A coding, support, or agent workflow can include planning, tool calls, critique, retries, and a final response before the user sees success.

Cost unit to compare

Successful action

Compare providers by the cost of one completed user action, accepted fix, resolved ticket, or processed row rather than the cheapest single call.

Cost audit before launch

Questions to answer before calling a model cheap

What is one successful user action in this product?
How many model calls does that action usually require?
How many input tokens are repeated and cacheable?
How long are the outputs after retries and tool traces?
Which traffic can use batch pricing without hurting latency?
What quality failure rate forces a second model call or human review?

Official API price calculator

Text APIs

Chat, reasoning, coding, agents, RAG, extraction, and long-context text workloads. Use this for chat, coding agents, RAG, extraction, and reasoning features.

Text Images Video Audio

Workload assumptions

Set expected usage; each row estimates monthly cost from official unit prices.

Monthly runs

Input tokens / run

Output tokens / run

Cache hit rate

%

Retries / failures

%

Batch traffic

%

Search official rows

ProviderSelected model

Official price table - Text composite score order

73 official USD rows - checked 2026-07-04 - sorted by Coding + Writing + Math score.

Daily source checks

Model	Provider	Inputper 1M tokens	Cacheper 1M tokens	Outputper 1M tokens		Region	Source	Notes
Score #1 - Text composite - 3/3 score sources	Anthropic Official API	$10.00 per 1M	$1.00 per 1M	$50.00 per 1M	$58.27	Global default	Claude API pricing Checked 2026-07-04	5-minute cache writes are $12.50/MTok and 1-hour cache writes are $20/MTok.
Score #2 - Text composite - 3/3 score sources	Anthropic Official API	$5.00 per 1M	$0.50 per 1M	$25.00 per 1M	$29.14	Global default	Claude API pricing Checked 2026-07-04	Fast mode and data residency are priced separately.
Score #3 - Text composite - 3/3 score sources	Anthropic Official API	$5.00 per 1M	$0.50 per 1M	$25.00 per 1M	$29.14	Global default	Claude API pricing Checked 2026-07-04	Opus 4.7 and later use a newer tokenizer; token counts can differ from older models.
Score #4 - Text composite - 3/3 score sources	Anthropic Official API	$5.00 per 1M	$0.50 per 1M	$25.00 per 1M	$29.14	Global default	Claude API pricing Checked 2026-07-04	Fast mode has separate premium pricing; US-only inference geography can add a 1.1x multiplier.
Score #8 - Text composite - 3/3 score sources	OpenAI Official API	$5.00 per 1M	$0.50 per 1M	$30.00 per 1M	$32.81	Global	OpenAI API pricing Checked 2026-07-04	OpenAI lists separate Standard, Batch, Flex, and Priority tiers; this row uses Standard short-context pricing.
Score #9 - Text composite - 3/3 score sources	OpenAI Official API	$2.50 per 1M	$0.25 per 1M	$15.00 per 1M	$16.41	Global	OpenAI API pricing Checked 2026-07-04	Regional data residency endpoints may add a 10% uplift for eligible newer models.
Score #10 - Text composite - 3/3 score sources	Google Official API	$2.00 per 1M	$0.20 per 1M	$12.00 per 1M	$13.13	Gemini API paid tier	Gemini API pricing Checked 2026-07-04	Prompts above 200K tokens have higher input, output, and cache prices.
Score #11 - Text composite - 3/3 score sources	Google Official API	$1.50 per 1M	$0.15 per 1M	$9.00 per 1M	$9.84	Gemini API paid tier	Gemini API pricing Checked 2026-07-04	Search and Maps grounding have separate charges after free quotas.
Score #12 - Text composite - 3/3 score sources	Google Official API	$0.50 per 1M	$0.05 per 1M	$3.00 per 1M	$3.28	Gemini API paid tier	Gemini API pricing Checked 2026-07-04	Preview models can change before becoming stable.
Score #16 - Text composite - 2/3 score sources	Z.AI Official API	$1.40 per 1M	$0.26 per 1M	$4.40 per 1M	$6.31	Z.AI API	Z.AI pricing Checked 2026-07-04	Cached input storage is listed as limited-time free on the source page.
Score #16 - Text composite - 2/3 score sources	Z.AI Official API	$1.00 per 1M	$0.20 per 1M	$3.20 per 1M	$4.56	Z.AI API	Z.AI pricing Checked 2026-07-04	Cached input storage is listed as limited-time free.
Score #18 - Text composite - 2/3 score sources	Anthropic Official API	$5.00 per 1M	$0.50 per 1M	$25.00 per 1M	$29.14	Global default	Claude API pricing Checked 2026-07-04	Batch API pricing is 50% lower for input and output tokens.
Score #20 - Text composite - 2/3 score sources	Moonshot AI Official API	$0.95 per 1M	$0.16 per 1M	$4.00 per 1M	$5.02	Kimi API	Kimi K2.6 pricing Checked 2026-07-04	Kimi pricing page lists cache-hit, input, and output prices.
Score #21 - Text composite - 2/3 score sources	MiniMax Official API	$0.30 per 1M	$0.06 per 1M	$1.20 per 1M	$1.54	MiniMax API	MiniMax pay-as-you-go pricing Checked 2026-07-04	Official page marks this as permanent 50% off compared with crossed-out list price.
Score #21 - Text composite - 2/3 score sources	MiniMax Official API	$0.60 per 1M	$0.12 per 1M	$2.40 per 1M	$3.09	MiniMax API	MiniMax pay-as-you-go pricing Checked 2026-07-04	Input tokens above 512K may have availability limits.
Score #24 - Text composite - 2/3 score sources	Moonshot AI Official API	$0.60 per 1M	$0.10 per 1M	$3.00 per 1M	$3.52	Kimi API	Kimi K2.5 pricing Checked 2026-07-04	Supports text, image, video input, thinking and non-thinking modes.
Score #25 - Text composite - 2/3 score sources	OpenAI Official API	$30.00 per 1M	n/a per 1M	$180 per 1M	$211	Global	OpenAI API pricing Checked 2026-07-04	Long-context standard pricing is higher; Batch is discounted where available.
Score #26 - Text composite - 2/3 score sources	OpenAI Official API	$1.75 per 1M	$0.175 per 1M	$14.00 per 1M	$14.06	Global	OpenAI API pricing Checked 2026-07-04	Priority pricing is listed separately on OpenAI pricing.
Score #29 - Text composite - 1/3 score sources	Anthropic Official API	$3.00 per 1M	$0.30 per 1M	$15.00 per 1M	$17.48	Global default	Claude API pricing Checked 2026-07-04	US-only inference geography adds a 1.1x multiplier for Sonnet 4.6 and later models.
Score #33 - Text composite - 1/3 score sources	Z.AI Official API	$0.60 per 1M	$0.11 per 1M	$2.20 per 1M	$2.93	Z.AI API	Z.AI pricing Checked 2026-07-04	Cached input storage is listed as limited-time free.
Score #33 - Text composite - 1/3 score sources	Z.AI Official API	$0.07 per 1M	$0.01 per 1M	$0.40 per 1M	$0.4463	Z.AI API	Z.AI pricing Checked 2026-07-04	Good row for budget-sensitive routing comparisons.
Score #33 - Text composite - 1/3 score sources	Z.AI Official API	$0.00 per 1M	$0.00 per 1M	$0.00 per 1M	$0.00	Z.AI API	Z.AI pricing Checked 2026-07-04	Displayed as Free in the official pricing table; availability can change.
Score #34 - Text composite - 1/3 score sources	Z.AI Official API	$1.20 per 1M	$0.24 per 1M	$4.00 per 1M	$5.59	Z.AI API	Z.AI pricing Checked 2026-07-04	Prices are per 1M tokens.
Score #41 - Text composite - 1/3 score sources	OpenAI Official API	$0.75 per 1M	$0.075 per 1M	$4.50 per 1M	$4.92	Global	OpenAI API pricing Checked 2026-07-04	Batch pricing is 50% lower in the official table.
Score #41 - Text composite - 1/3 score sources	OpenAI Official API	$0.20 per 1M	$0.02 per 1M	$1.25 per 1M	$1.35	Global	OpenAI API pricing Checked 2026-07-04	Good reference point for lightweight classification, extraction, and chat routing.
Score #41 - Text composite - 1/3 score sources	OpenAI Official API	$30.00 per 1M	n/a per 1M	$180 per 1M	$211	Global	OpenAI API pricing Checked 2026-07-04	No cached-input price is listed for this pro row in the flagship pricing table.
Score #44 - Text composite - 1/3 score sources	MiniMax Official API	$0.30 per 1M	$0.06 per 1M	$1.20 per 1M	$1.54	MiniMax API	MiniMax pay-as-you-go pricing Checked 2026-07-04	Prompt cache read and write are separate pricing fields.
Score #44 - Text composite - 1/3 score sources	MiniMax Official API	$0.60 per 1M	$0.06 per 1M	$2.40 per 1M	$3.06	MiniMax API	MiniMax pay-as-you-go pricing Checked 2026-07-04	High-speed tier doubles input and output price compared with standard.
Score #45 - Text composite - 1/3 score sources	Alibaba Cloud Official API	$0.172 per 1M	n/a per 1M	$1.03 per 1M	$1.21	Global deployment	Alibaba Cloud Model Studio pricing Checked 2026-07-04	Longer prompt tiers are priced higher in the official table.
Score #47 - Text composite - 1/3 score sources	Anthropic Official API	$3.00 per 1M	$0.30 per 1M	$15.00 per 1M	$17.48	Global default	Claude API pricing Checked 2026-07-04	Prompt cache writes are listed separately from cache hits.
Score #52 - Text composite - 1/3 score sources	MiniMax Official API	$0.30 per 1M	$0.03 per 1M	$1.20 per 1M	$1.53	MiniMax API	MiniMax pay-as-you-go pricing Checked 2026-07-04	Legacy model retained for users comparing older production integrations.
Score #52 - Text composite - 1/3 score sources	MiniMax Official API	$0.60 per 1M	$0.03 per 1M	$2.40 per 1M	$3.04	MiniMax API	MiniMax pay-as-you-go pricing Checked 2026-07-04	High-speed tier doubles input and output price compared with standard.
Score #55 - Text composite - 1/3 score sources	xAI Official API	$1.25 per 1M	$0.20 per 1M	$2.50 per 1M	$4.57	xAI API	xAI API pricing Checked 2026-07-04	Server-side tools are charged separately from token usage.
Score #57 - Text composite - 1/3 score sources	Alibaba Cloud Official API	$0.115 per 1M	n/a per 1M	$0.917 per 1M	$0.9759	Global deployment	Alibaba Cloud Model Studio pricing Checked 2026-07-04	Global endpoint and storage are in US Virginia or Germany Frankfurt.
Score #59 - Text composite - 1/3 score sources	Z.AI Official API	$0.60 per 1M	$0.11 per 1M	$2.20 per 1M	$2.93	Z.AI API	Z.AI pricing Checked 2026-07-04	Cached input storage is listed as limited-time free.
Score #60 - Text composite - 1/3 score sources	Anthropic Official API	$1.00 per 1M	$0.10 per 1M	$5.00 per 1M	$5.83	Global default	Claude API pricing Checked 2026-07-04	Useful Claude baseline for high-volume support and extraction workloads.
Score #72 - Text composite - 1/3 score sources	Mistral Official API	$1.50 per 1M	n/a per 1M	$7.50 per 1M	$9.45	Mistral API	Mistral pricing Checked 2026-07-04	Batch processing gets a 50% discount.
Score #73 - Text composite - 1/3 score sources	Google Official API	$0.25 per 1M	$0.025 per 1M	$1.50 per 1M	$1.64	Gemini API paid tier	Gemini API pricing Checked 2026-07-04	Audio input is priced separately on Google pricing.
Score #80 - Text composite - 1/3 score sources	Mistral Official API	$0.50 per 1M	n/a per 1M	$1.50 per 1M	$2.42	Mistral API	Mistral pricing Checked 2026-07-04	Mistral pricing is per million input and output tokens.
Score #82 - Text composite - 1/3 score sources	Google Official API	$1.25 per 1M	$0.125 per 1M	$10.00 per 1M	$10.04	Gemini API paid tier	Gemini API pricing Checked 2026-07-04	Prompts above 200K tokens have higher prices.
Official price row	OpenAI Official API	$5.00 per 1M	$0.50 per 1M	$30.00 per 1M	$32.81	Global	OpenAI API pricing Checked 2026-07-04	Useful when comparing ChatGPT chat-latest against flagship model APIs.
Official price row	Anthropic Official API	$10.00 per 1M	$1.00 per 1M	$50.00 per 1M	$58.27	Limited availability	Claude API pricing Checked 2026-07-04	Official table marks Mythos 5 as limited availability.
Official price row	Google Official API	$0.30 per 1M	$0.03 per 1M	$2.50 per 1M	$2.48	Gemini API paid tier	Gemini API pricing Checked 2026-07-04	Grounding and Maps pricing are separate from token billing.
Official price row	xAI Official API	$1.00 per 1M	$0.20 per 1M	$2.00 per 1M	$3.68	xAI API	xAI API pricing Checked 2026-07-04	Currently described by xAI as early access.
Official price row	xAI Official API	$1.25 per 1M	$0.20 per 1M	$2.50 per 1M	$4.57	xAI API	xAI API pricing Checked 2026-07-04	Listed in xAI Chat API pricing with the same token rates as grok-4.3.
Official price row	xAI Official API	$1.25 per 1M	$0.20 per 1M	$2.50 per 1M	$4.57	xAI API	xAI API pricing Checked 2026-07-04	Reasoning tokens are billed under the model token rates.
Official price row	xAI Official API	$1.25 per 1M	$0.20 per 1M	$2.50 per 1M	$4.57	xAI API	xAI API pricing Checked 2026-07-04	Batch pricing can vary by model detail page.
Official price row	Mistral Official API	$0.10 per 1M	n/a per 1M	$0.30 per 1M	$0.483	Mistral API	Mistral pricing Checked 2026-07-04	Open model listed on Mistral pricing.
Official price row	Mistral Official API	$2.00 per 1M	n/a per 1M	$5.00 per 1M	$8.93	Mistral API	Mistral pricing Checked 2026-07-04	Use for reasoning comparisons against general-purpose models.
Official price row	Mistral Official API	$0.50 per 1M	n/a per 1M	$1.50 per 1M	$2.42	Mistral API	Mistral pricing Checked 2026-07-04	Batch processing gets a 50% discount.
Official price row	Mistral Official API	$0.10 per 1M	n/a per 1M	$0.10 per 1M	$0.336	Mistral API	Mistral pricing Checked 2026-07-04	Best for low-cost routing and lightweight agent steps.
Official price row	Mistral Official API	$0.15 per 1M	n/a per 1M	$0.15 per 1M	$0.504	Mistral API	Mistral pricing Checked 2026-07-04	Low-cost open model in Mistral pricing.
COfficial price row	Cohere Official API	$0.50 per 1M	n/a per 1M	$1.50 per 1M	$2.42	Cohere API	Cohere pricing Checked 2026-07-04	Cohere lists Aya Expanse API pricing in the official pricing FAQ.
COfficial price row	Cohere Official API	$1.00 per 1M	n/a per 1M	$2.00 per 1M	$4.10	Existing Cohere customers	Cohere pricing Checked 2026-07-04	Cohere marks these as legacy model prices for existing customers.
COfficial price row	Cohere Official API	$0.30 per 1M	n/a per 1M	$0.60 per 1M	$1.23	Existing Cohere customers	Cohere pricing Checked 2026-07-04	Listed in Cohere pricing FAQ as legacy pricing.
COfficial price row	Cohere Official API	$0.50 per 1M	n/a per 1M	$1.50 per 1M	$2.42	Existing Cohere customers	Cohere pricing Checked 2026-07-04	Listed in Cohere pricing FAQ as legacy pricing.
COfficial price row	Cohere Official API	$3.00 per 1M	n/a per 1M	$15.00 per 1M	$18.90	Existing Cohere customers	Cohere pricing Checked 2026-07-04	Listed in Cohere pricing FAQ as legacy pricing.
COfficial price row	Cohere Official API	$2.50 per 1M	n/a per 1M	$10.00 per 1M	$13.91	Existing Cohere customers	Cohere pricing Checked 2026-07-04	Listed in Cohere pricing FAQ as legacy pricing.
Official price row	DeepSeek Official API	$0.27 per 1M	$0.07 per 1M	$1.10 per 1M	$1.41	DeepSeek API	DeepSeek USD pricing Checked 2026-07-04	Automatic context caching uses cache-hit and cache-miss input prices.
Official price row	DeepSeek Official API	$0.55 per 1M	$0.14 per 1M	$2.19 per 1M	$2.84	DeepSeek API	DeepSeek USD pricing Checked 2026-07-04	Reasoning output and CoT behavior can change real workload cost.
Official price row	Alibaba Cloud Official API	$0.359 per 1M	n/a per 1M	$1.43 per 1M	$2.00	Global deployment	Alibaba Cloud Model Studio pricing Checked 2026-07-04	Higher prompt-size tiers increase input and output prices.
Official price row	Alibaba Cloud Official API	$0.30 per 1M	n/a per 1M	$1.50 per 1M	$1.89	EU deployment	Alibaba Cloud Model Studio pricing Checked 2026-07-04	Higher prompt tiers are listed separately up to 256K tokens.
Official price row	Alibaba Cloud Official API	$0.80 per 1M	n/a per 1M	$2.40 per 1M	$3.86	International deployment	Alibaba Cloud Model Studio pricing Checked 2026-07-04	International inference is dynamically scheduled globally excluding Chinese Mainland.
Official price row	Alibaba Cloud Official API	$0.287 per 1M	n/a per 1M	$0.861 per 1M	$1.39	Model Studio	Alibaba Cloud Model Studio pricing Checked 2026-07-04	Listed under QwQ open source model pricing.
Official price row	Alibaba Cloud Official API	$0.072 per 1M	n/a per 1M	$0.287 per 1M	$0.3999	Chinese Mainland deployment	Alibaba Cloud Model Studio pricing Checked 2026-07-04	Chinese Mainland deployment only according to official table.
Official price row	Alibaba Cloud Official API	$0.574 per 1M	n/a per 1M	$1.72 per 1M	$2.77	Chinese Mainland deployment	Alibaba Cloud Model Studio pricing Checked 2026-07-04	Useful for math-specific API cost comparisons.
Official price row	Moonshot AI Official API	$0.95 per 1M	$0.19 per 1M	$4.00 per 1M	$5.03	Kimi API	Kimi K2.7 Code pricing Checked 2026-07-04	Limited-time promotion is mentioned on the official page.
Official price row	Z.AI Official API	$0.60 per 1M	$0.11 per 1M	$2.20 per 1M	$2.93	Z.AI API	Z.AI pricing Checked 2026-07-04	Cached input storage is listed as limited-time free.
Official price row	Z.AI Official API	$2.20 per 1M	$0.45 per 1M	$8.90 per 1M	$11.40	Z.AI API	Z.AI pricing Checked 2026-07-04	Prices are per 1M tokens.
Official price row	Z.AI Official API	$0.20 per 1M	$0.03 per 1M	$1.10 per 1M	$1.24	Z.AI API	Z.AI pricing Checked 2026-07-04	Cached input storage is listed as limited-time free.
Official price row	Z.AI Official API	$1.10 per 1M	$0.22 per 1M	$4.50 per 1M	$5.73	Z.AI API	Z.AI pricing Checked 2026-07-04	Prices are per 1M tokens.
Official price row	Z.AI Official API	$0.10 per 1M	n/a per 1M	$0.10 per 1M	$0.336	Z.AI API	Z.AI pricing Checked 2026-07-04	No cached-input price is listed for this row.
Official price row	Z.AI Official API	$0.00 per 1M	$0.00 per 1M	$0.00 per 1M	$0.00	Z.AI API	Z.AI pricing Checked 2026-07-04	Displayed as Free in the official pricing table; availability can change.

Production cost traps

The cheapest model on paper can be the wrong production default.

Most app flows rarely send one clean prompt and one neat answer. They retry, call tools, reuse some context, miss cache sometimes, and produce outputs that are longer than the demo. Use this section to sanity-check the estimate before trusting the table.

Output tokens quietly decide the bill

A model with cheap input can still become expensive if it writes long answers, verbose tool traces, or multi-step reasoning for every request.

Cache savings need repeatable prompts

Prompt caching helps most when system prompts, retrieval wrappers, or conversation prefixes repeat. It helps much less when every request is unique.

Agents turn one click into many calls

A single user action can become planning, search, tool calls, retries, critique, and final rewrite. Count the whole action, not only the final answer.

Batch discounts are not live traffic

Batch pricing can be great for offline jobs, enrichment, and nightly processing, but it should not be used as the default cost for a realtime user flow.

Comparison workflow

Compare scenarios, not just models.

A better pricing decision usually comes from testing the same workload against three rows: the cheapest plausible option, the model you actually want, and one balanced middle choice.

1Estimate tokens per successful user action, not just per API call.

2Add retries, tool calls, failed generations, cache misses, and longer-than-expected outputs.

3Run the same workload assumptions across the cheapest row, the quality leader, and one middle option.

4Open the official source before launch because price rows, cache rules, and batch eligibility can change.

Text API pricing FAQ

Why is the cheapest text API row not always cheapest in production?

Because production usage includes retries, output length, cache misses, tool calls, failed generations, and sometimes multiple model calls per user action. The row with the lowest input price can lose once the full workflow is counted.

Should I compare input price or output price first?

Compare both, but start with the side your app uses most. RAG and extraction can be input-heavy, while writing, coding, agents, and support replies can become output-heavy.

When does prompt caching matter for API pricing?

Prompt caching matters when a large prefix repeats across many requests, such as a stable system prompt, long policy, shared tool schema, or repeated retrieval wrapper. It matters less for one-off prompts.

Is batch pricing safe to use for a live app estimate?

Usually no. Batch pricing is best for offline or delayed jobs. For live chat, agents, customer support, and interactive product flows, estimate with realtime prices unless the provider explicitly supports your latency needs.

How do retries affect text model API cost?

Retries multiply both input and output usage. For agents, coding, extraction, and support flows, estimate cost per successful user action instead of cost per first API call.

Which text model price should I compare first?

Start with the models that pass your quality bar, then compare input, output, cache, batch, and retry assumptions. A cheap row that fails more often is rarely the cheapest production choice.