Source check
Every listed row should trace back to an official provider pricing, docs, model, or API page before it becomes a comparison row.
Text API costs move when prompts get longer, agents retry, cache hits vary, and output tokens grow. Start with your workload assumptions, then compare official model rows against the model-quality pages.
Price rows
73
Providers
11
Official sources
13
Last updated
2026-07-04T06:36:11.407Z
Last source check: 2026-07-04
What changed in this update
Refreshed 73 official text price rows.
Grouped rows across 11 providers and 13 official source pages.
Kept workload guidance tied to launch checks, real usage units, and official-source verification.
What the price row misses
AI API pricing pages often look simple because they compare one published row at a time. This page keeps the official row visible, then adds the messy assumptions that show up in products: retries, long outputs, cache misses, batches, and review loops.
Source check
Every listed row should trace back to an official provider pricing, docs, model, or API page before it becomes a comparison row.
Unit check
Rows are kept in their original billing units when conversion would hide an important difference, such as per-second video or per-image generation.
Workload check
The calculator starts from product behavior: retries, cache hits, long prompts, output length, batch jobs, and rejected generations.
Launch check
Before a production rollout, reopen the official source because provider prices, cache rules, model names, and eligibility can change quickly.
Pricing validation playbook
Official rows are the starting point. The production decision comes from measuring the unit your users actually complete, the retries they create, and the quality gates you need before an output is accepted.
Define the unit
Decide whether the product unit is one answer, one resolved ticket, one accepted code fix, one processed row, or one completed agent task.
Instrument tokens
Log token use before launch in a small pilot. Separate repeated prefixes, retrieved context, generated output, failed calls, and tool loops.
Compare finalists
Run the same workload assumptions through at least three candidate rows instead of choosing the lowest published input price.
Review after launch
After traffic starts, update the calculator with real token distributions, quality failure rate, latency needs, and cache hit rate.
Workload cheat sheet
The same API price row means different things for chat, RAG, agents, coding, and batch jobs. Before comparing providers, decide what a successful unit is and measure that unit instead of a single clean API call.
Workload
What moves the bill
Output length, retries, and conversation history.
Measure first
Average tokens per resolved conversation.
Workload
What moves the bill
Retrieved context, repeated system prompts, and cache hit rate.
Measure first
Input tokens per answered question.
Workload
What moves the bill
Planning calls, tool calls, retries, critique, and final answer generation.
Measure first
Model calls per successful user action.
Workload
What moves the bill
Long prompts, repository context, generated diffs, and review loops.
Measure first
Tokens per accepted fix or merged task.
Workload
What moves the bill
Volume, batch eligibility, failed rows, and delayed processing tolerance.
Measure first
Rows processed per month and allowed latency.
Starting assumptions
These are not universal benchmarks. They are practical starting points for the calculator when you have not instrumented production yet. After launch, replace them with measured tokens per successful user action.
Support chat
Use this when one customer question usually turns into one answer, with a short history and a modest retry buffer.
RAG question answering
Use this when retrieved context is the main cost driver and answers are short, but repeated prompts or wrappers can be cached.
Agent workflow
Use this when one user click can create planning, tool calls, reflection, and a final response rather than one clean call.
Batch extraction
Use this for offline enrichment, classification, or structured extraction where delayed processing is acceptable.
One workflow example
The common mistake is comparing one clean API call when the product actually needs several calls to finish the job. The calculator should be driven by the unit your users experience.
Clean spreadsheet row
A quick comparison usually assumes one prompt, one answer, and no retry. That is useful for a first scan, but it is not how most products behave.
Agent action
A coding, support, or agent workflow can include planning, tool calls, critique, retries, and a final response before the user sees success.
Cost unit to compare
Compare providers by the cost of one completed user action, accepted fix, resolved ticket, or processed row rather than the cheapest single call.
Cost audit before launch
Set expected usage; each row estimates monthly cost from official unit prices.
73 official USD rows - checked 2026-07-04 - sorted by Coding + Writing + Math score.
| Model | Provider | Inputper 1M tokens | Cacheper 1M tokens | Outputper 1M tokens | Region | Source | Notes | |
|---|---|---|---|---|---|---|---|---|
Anthropic Official API | $10.00 per 1M | $1.00 per 1M | $50.00 per 1M | $58.27 | Global default | Claude API pricing Checked 2026-07-04 | 5-minute cache writes are $12.50/MTok and 1-hour cache writes are $20/MTok. | |
Anthropic Official API | $5.00 per 1M | $0.50 per 1M | $25.00 per 1M | $29.14 | Global default | Claude API pricing Checked 2026-07-04 | Fast mode and data residency are priced separately. | |
Anthropic Official API | $5.00 per 1M | $0.50 per 1M | $25.00 per 1M | $29.14 | Global default | Claude API pricing Checked 2026-07-04 | Opus 4.7 and later use a newer tokenizer; token counts can differ from older models. | |
Anthropic Official API | $5.00 per 1M | $0.50 per 1M | $25.00 per 1M | $29.14 | Global default | Claude API pricing Checked 2026-07-04 | Fast mode has separate premium pricing; US-only inference geography can add a 1.1x multiplier. | |
OpenAI Official API | $5.00 per 1M | $0.50 per 1M | $30.00 per 1M | $32.81 | Global | OpenAI API pricing Checked 2026-07-04 | OpenAI lists separate Standard, Batch, Flex, and Priority tiers; this row uses Standard short-context pricing. | |
OpenAI Official API | $2.50 per 1M | $0.25 per 1M | $15.00 per 1M | $16.41 | Global | OpenAI API pricing Checked 2026-07-04 | Regional data residency endpoints may add a 10% uplift for eligible newer models. | |
Official API | $2.00 per 1M | $0.20 per 1M | $12.00 per 1M | $13.13 | Gemini API paid tier | Gemini API pricing Checked 2026-07-04 | Prompts above 200K tokens have higher input, output, and cache prices. | |
Official API | $1.50 per 1M | $0.15 per 1M | $9.00 per 1M | $9.84 | Gemini API paid tier | Gemini API pricing Checked 2026-07-04 | Search and Maps grounding have separate charges after free quotas. | |
Official API | $0.50 per 1M | $0.05 per 1M | $3.00 per 1M | $3.28 | Gemini API paid tier | Gemini API pricing Checked 2026-07-04 | Preview models can change before becoming stable. | |
Z.AI Official API | $1.40 per 1M | $0.26 per 1M | $4.40 per 1M | $6.31 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Cached input storage is listed as limited-time free on the source page. | |
Z.AI Official API | $1.00 per 1M | $0.20 per 1M | $3.20 per 1M | $4.56 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Cached input storage is listed as limited-time free. | |
Anthropic Official API | $5.00 per 1M | $0.50 per 1M | $25.00 per 1M | $29.14 | Global default | Claude API pricing Checked 2026-07-04 | Batch API pricing is 50% lower for input and output tokens. | |
Moonshot AI Official API | $0.95 per 1M | $0.16 per 1M | $4.00 per 1M | $5.02 | Kimi API | Kimi K2.6 pricing Checked 2026-07-04 | Kimi pricing page lists cache-hit, input, and output prices. | |
MiniMax Official API | $0.30 per 1M | $0.06 per 1M | $1.20 per 1M | $1.54 | MiniMax API | MiniMax pay-as-you-go pricing Checked 2026-07-04 | Official page marks this as permanent 50% off compared with crossed-out list price. | |
MiniMax Official API | $0.60 per 1M | $0.12 per 1M | $2.40 per 1M | $3.09 | MiniMax API | MiniMax pay-as-you-go pricing Checked 2026-07-04 | Input tokens above 512K may have availability limits. | |
Moonshot AI Official API | $0.60 per 1M | $0.10 per 1M | $3.00 per 1M | $3.52 | Kimi API | Kimi K2.5 pricing Checked 2026-07-04 | Supports text, image, video input, thinking and non-thinking modes. | |
OpenAI Official API | $30.00 per 1M | n/a per 1M | $180 per 1M | $211 | Global | OpenAI API pricing Checked 2026-07-04 | Long-context standard pricing is higher; Batch is discounted where available. | |
OpenAI Official API | $1.75 per 1M | $0.175 per 1M | $14.00 per 1M | $14.06 | Global | OpenAI API pricing Checked 2026-07-04 | Priority pricing is listed separately on OpenAI pricing. | |
Anthropic Official API | $3.00 per 1M | $0.30 per 1M | $15.00 per 1M | $17.48 | Global default | Claude API pricing Checked 2026-07-04 | US-only inference geography adds a 1.1x multiplier for Sonnet 4.6 and later models. | |
Z.AI Official API | $0.60 per 1M | $0.11 per 1M | $2.20 per 1M | $2.93 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Cached input storage is listed as limited-time free. | |
Z.AI Official API | $0.07 per 1M | $0.01 per 1M | $0.40 per 1M | $0.4463 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Good row for budget-sensitive routing comparisons. | |
Z.AI Official API | $0.00 per 1M | $0.00 per 1M | $0.00 per 1M | $0.00 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Displayed as Free in the official pricing table; availability can change. | |
Z.AI Official API | $1.20 per 1M | $0.24 per 1M | $4.00 per 1M | $5.59 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Prices are per 1M tokens. | |
OpenAI Official API | $0.75 per 1M | $0.075 per 1M | $4.50 per 1M | $4.92 | Global | OpenAI API pricing Checked 2026-07-04 | Batch pricing is 50% lower in the official table. | |
OpenAI Official API | $0.20 per 1M | $0.02 per 1M | $1.25 per 1M | $1.35 | Global | OpenAI API pricing Checked 2026-07-04 | Good reference point for lightweight classification, extraction, and chat routing. | |
OpenAI Official API | $30.00 per 1M | n/a per 1M | $180 per 1M | $211 | Global | OpenAI API pricing Checked 2026-07-04 | No cached-input price is listed for this pro row in the flagship pricing table. | |
MiniMax Official API | $0.30 per 1M | $0.06 per 1M | $1.20 per 1M | $1.54 | MiniMax API | MiniMax pay-as-you-go pricing Checked 2026-07-04 | Prompt cache read and write are separate pricing fields. | |
MiniMax Official API | $0.60 per 1M | $0.06 per 1M | $2.40 per 1M | $3.06 | MiniMax API | MiniMax pay-as-you-go pricing Checked 2026-07-04 | High-speed tier doubles input and output price compared with standard. | |
Alibaba Cloud Official API | $0.172 per 1M | n/a per 1M | $1.03 per 1M | $1.21 | Global deployment | Alibaba Cloud Model Studio pricing Checked 2026-07-04 | Longer prompt tiers are priced higher in the official table. | |
Anthropic Official API | $3.00 per 1M | $0.30 per 1M | $15.00 per 1M | $17.48 | Global default | Claude API pricing Checked 2026-07-04 | Prompt cache writes are listed separately from cache hits. | |
MiniMax Official API | $0.30 per 1M | $0.03 per 1M | $1.20 per 1M | $1.53 | MiniMax API | MiniMax pay-as-you-go pricing Checked 2026-07-04 | Legacy model retained for users comparing older production integrations. | |
MiniMax Official API | $0.60 per 1M | $0.03 per 1M | $2.40 per 1M | $3.04 | MiniMax API | MiniMax pay-as-you-go pricing Checked 2026-07-04 | High-speed tier doubles input and output price compared with standard. | |
xAI Official API | $1.25 per 1M | $0.20 per 1M | $2.50 per 1M | $4.57 | xAI API | xAI API pricing Checked 2026-07-04 | Server-side tools are charged separately from token usage. | |
Alibaba Cloud Official API | $0.115 per 1M | n/a per 1M | $0.917 per 1M | $0.9759 | Global deployment | Alibaba Cloud Model Studio pricing Checked 2026-07-04 | Global endpoint and storage are in US Virginia or Germany Frankfurt. | |
Z.AI Official API | $0.60 per 1M | $0.11 per 1M | $2.20 per 1M | $2.93 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Cached input storage is listed as limited-time free. | |
Anthropic Official API | $1.00 per 1M | $0.10 per 1M | $5.00 per 1M | $5.83 | Global default | Claude API pricing Checked 2026-07-04 | Useful Claude baseline for high-volume support and extraction workloads. | |
Mistral Official API | $1.50 per 1M | n/a per 1M | $7.50 per 1M | $9.45 | Mistral API | Mistral pricing Checked 2026-07-04 | Batch processing gets a 50% discount. | |
Official API | $0.25 per 1M | $0.025 per 1M | $1.50 per 1M | $1.64 | Gemini API paid tier | Gemini API pricing Checked 2026-07-04 | Audio input is priced separately on Google pricing. | |
Mistral Official API | $0.50 per 1M | n/a per 1M | $1.50 per 1M | $2.42 | Mistral API | Mistral pricing Checked 2026-07-04 | Mistral pricing is per million input and output tokens. | |
Official API | $1.25 per 1M | $0.125 per 1M | $10.00 per 1M | $10.04 | Gemini API paid tier | Gemini API pricing Checked 2026-07-04 | Prompts above 200K tokens have higher prices. | |
OpenAI Official API | $5.00 per 1M | $0.50 per 1M | $30.00 per 1M | $32.81 | Global | OpenAI API pricing Checked 2026-07-04 | Useful when comparing ChatGPT chat-latest against flagship model APIs. | |
Anthropic Official API | $10.00 per 1M | $1.00 per 1M | $50.00 per 1M | $58.27 | Limited availability | Claude API pricing Checked 2026-07-04 | Official table marks Mythos 5 as limited availability. | |
Official API | $0.30 per 1M | $0.03 per 1M | $2.50 per 1M | $2.48 | Gemini API paid tier | Gemini API pricing Checked 2026-07-04 | Grounding and Maps pricing are separate from token billing. | |
xAI Official API | $1.00 per 1M | $0.20 per 1M | $2.00 per 1M | $3.68 | xAI API | xAI API pricing Checked 2026-07-04 | Currently described by xAI as early access. | |
xAI Official API | $1.25 per 1M | $0.20 per 1M | $2.50 per 1M | $4.57 | xAI API | xAI API pricing Checked 2026-07-04 | Listed in xAI Chat API pricing with the same token rates as grok-4.3. | |
xAI Official API | $1.25 per 1M | $0.20 per 1M | $2.50 per 1M | $4.57 | xAI API | xAI API pricing Checked 2026-07-04 | Reasoning tokens are billed under the model token rates. | |
xAI Official API | $1.25 per 1M | $0.20 per 1M | $2.50 per 1M | $4.57 | xAI API | xAI API pricing Checked 2026-07-04 | Batch pricing can vary by model detail page. | |
Mistral Official API | $0.10 per 1M | n/a per 1M | $0.30 per 1M | $0.483 | Mistral API | Mistral pricing Checked 2026-07-04 | Open model listed on Mistral pricing. | |
Mistral Official API | $2.00 per 1M | n/a per 1M | $5.00 per 1M | $8.93 | Mistral API | Mistral pricing Checked 2026-07-04 | Use for reasoning comparisons against general-purpose models. | |
Mistral Official API | $0.50 per 1M | n/a per 1M | $1.50 per 1M | $2.42 | Mistral API | Mistral pricing Checked 2026-07-04 | Batch processing gets a 50% discount. | |
Mistral Official API | $0.10 per 1M | n/a per 1M | $0.10 per 1M | $0.336 | Mistral API | Mistral pricing Checked 2026-07-04 | Best for low-cost routing and lightweight agent steps. | |
Mistral Official API | $0.15 per 1M | n/a per 1M | $0.15 per 1M | $0.504 | Mistral API | Mistral pricing Checked 2026-07-04 | Low-cost open model in Mistral pricing. | |
COfficial price row | Cohere Official API | $0.50 per 1M | n/a per 1M | $1.50 per 1M | $2.42 | Cohere API | Cohere pricing Checked 2026-07-04 | Cohere lists Aya Expanse API pricing in the official pricing FAQ. |
COfficial price row | Cohere Official API | $1.00 per 1M | n/a per 1M | $2.00 per 1M | $4.10 | Existing Cohere customers | Cohere pricing Checked 2026-07-04 | Cohere marks these as legacy model prices for existing customers. |
COfficial price row | Cohere Official API | $0.30 per 1M | n/a per 1M | $0.60 per 1M | $1.23 | Existing Cohere customers | Cohere pricing Checked 2026-07-04 | Listed in Cohere pricing FAQ as legacy pricing. |
COfficial price row | Cohere Official API | $0.50 per 1M | n/a per 1M | $1.50 per 1M | $2.42 | Existing Cohere customers | Cohere pricing Checked 2026-07-04 | Listed in Cohere pricing FAQ as legacy pricing. |
COfficial price row | Cohere Official API | $3.00 per 1M | n/a per 1M | $15.00 per 1M | $18.90 | Existing Cohere customers | Cohere pricing Checked 2026-07-04 | Listed in Cohere pricing FAQ as legacy pricing. |
COfficial price row | Cohere Official API | $2.50 per 1M | n/a per 1M | $10.00 per 1M | $13.91 | Existing Cohere customers | Cohere pricing Checked 2026-07-04 | Listed in Cohere pricing FAQ as legacy pricing. |
DeepSeek Official API | $0.27 per 1M | $0.07 per 1M | $1.10 per 1M | $1.41 | DeepSeek API | DeepSeek USD pricing Checked 2026-07-04 | Automatic context caching uses cache-hit and cache-miss input prices. | |
DeepSeek Official API | $0.55 per 1M | $0.14 per 1M | $2.19 per 1M | $2.84 | DeepSeek API | DeepSeek USD pricing Checked 2026-07-04 | Reasoning output and CoT behavior can change real workload cost. | |
Alibaba Cloud Official API | $0.359 per 1M | n/a per 1M | $1.43 per 1M | $2.00 | Global deployment | Alibaba Cloud Model Studio pricing Checked 2026-07-04 | Higher prompt-size tiers increase input and output prices. | |
Alibaba Cloud Official API | $0.30 per 1M | n/a per 1M | $1.50 per 1M | $1.89 | EU deployment | Alibaba Cloud Model Studio pricing Checked 2026-07-04 | Higher prompt tiers are listed separately up to 256K tokens. | |
Alibaba Cloud Official API | $0.80 per 1M | n/a per 1M | $2.40 per 1M | $3.86 | International deployment | Alibaba Cloud Model Studio pricing Checked 2026-07-04 | International inference is dynamically scheduled globally excluding Chinese Mainland. | |
Alibaba Cloud Official API | $0.287 per 1M | n/a per 1M | $0.861 per 1M | $1.39 | Model Studio | Alibaba Cloud Model Studio pricing Checked 2026-07-04 | Listed under QwQ open source model pricing. | |
Alibaba Cloud Official API | $0.072 per 1M | n/a per 1M | $0.287 per 1M | $0.3999 | Chinese Mainland deployment | Alibaba Cloud Model Studio pricing Checked 2026-07-04 | Chinese Mainland deployment only according to official table. | |
Alibaba Cloud Official API | $0.574 per 1M | n/a per 1M | $1.72 per 1M | $2.77 | Chinese Mainland deployment | Alibaba Cloud Model Studio pricing Checked 2026-07-04 | Useful for math-specific API cost comparisons. | |
Moonshot AI Official API | $0.95 per 1M | $0.19 per 1M | $4.00 per 1M | $5.03 | Kimi API | Kimi K2.7 Code pricing Checked 2026-07-04 | Limited-time promotion is mentioned on the official page. | |
Z.AI Official API | $0.60 per 1M | $0.11 per 1M | $2.20 per 1M | $2.93 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Cached input storage is listed as limited-time free. | |
Z.AI Official API | $2.20 per 1M | $0.45 per 1M | $8.90 per 1M | $11.40 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Prices are per 1M tokens. | |
Z.AI Official API | $0.20 per 1M | $0.03 per 1M | $1.10 per 1M | $1.24 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Cached input storage is listed as limited-time free. | |
Z.AI Official API | $1.10 per 1M | $0.22 per 1M | $4.50 per 1M | $5.73 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Prices are per 1M tokens. | |
Z.AI Official API | $0.10 per 1M | n/a per 1M | $0.10 per 1M | $0.336 | Z.AI API | Z.AI pricing Checked 2026-07-04 | No cached-input price is listed for this row. | |
Z.AI Official API | $0.00 per 1M | $0.00 per 1M | $0.00 per 1M | $0.00 | Z.AI API | Z.AI pricing Checked 2026-07-04 | Displayed as Free in the official pricing table; availability can change. |
Production cost traps
Most app flows rarely send one clean prompt and one neat answer. They retry, call tools, reuse some context, miss cache sometimes, and produce outputs that are longer than the demo. Use this section to sanity-check the estimate before trusting the table.
A model with cheap input can still become expensive if it writes long answers, verbose tool traces, or multi-step reasoning for every request.
Prompt caching helps most when system prompts, retrieval wrappers, or conversation prefixes repeat. It helps much less when every request is unique.
A single user action can become planning, search, tool calls, retries, critique, and final rewrite. Count the whole action, not only the final answer.
Batch pricing can be great for offline jobs, enrichment, and nightly processing, but it should not be used as the default cost for a realtime user flow.
Comparison workflow
A better pricing decision usually comes from testing the same workload against three rows: the cheapest plausible option, the model you actually want, and one balanced middle choice.
Text API pricing FAQ
Because production usage includes retries, output length, cache misses, tool calls, failed generations, and sometimes multiple model calls per user action. The row with the lowest input price can lose once the full workflow is counted.
Compare both, but start with the side your app uses most. RAG and extraction can be input-heavy, while writing, coding, agents, and support replies can become output-heavy.
Prompt caching matters when a large prefix repeats across many requests, such as a stable system prompt, long policy, shared tool schema, or repeated retrieval wrapper. It matters less for one-off prompts.
Usually no. Batch pricing is best for offline or delayed jobs. For live chat, agents, customer support, and interactive product flows, estimate with realtime prices unless the provider explicitly supports your latency needs.
Retries multiply both input and output usage. For agents, coding, extraction, and support flows, estimate cost per successful user action instead of cost per first API call.
Start with the models that pass your quality bar, then compare input, output, cache, batch, and retry assumptions. A cheap row that fails more often is rarely the cheapest production choice.