How accurate is this calculator's pricing?

Snapshot of public list prices from each provider's pricing page as of late 2025 / early 2026. Prices change — providers cut rates 2-3 times per year for older models. Always verify on the provider's pricing page (anthropic.com/pricing, openai.com/api/pricing, ai.google.dev/pricing) before basing a production budget on this. The matrix gives you the relative ranking, which changes more slowly.

What's a realistic cache-hit rate?

70% is the calculator's default and a reasonable baseline. Workloads with stable system prompts + dynamic user messages typically hit 75-90%. RAG systems with rotating retrieved chunks hit 50-70%. Single-turn API calls with no prefix hit 0% — caching can't help. Measure your actual hit rate with the provider's metrics; the calc lets you plug it in.

When should I use the Batch API?

Any workload that can wait 24 hours: backfills, periodic evaluations, async data labeling, dataset preprocessing. Batch API is 50% off both input and output across all major providers. Real-time chatbots, customer-facing apps, and anything with a latency budget can't use it. For internal tooling and async pipelines, batch saves half the bill with zero quality difference.

How do I estimate input/output tokens accurately?

Rule of thumb: 1 token ≈ 0.75 English words or 4 characters. So a 1,000-word system prompt ≈ 1,300 tokens. For precise counts, use the provider's tokenizer (Anthropic's anthropic.tokenize, OpenAI's tiktoken). For mixed-language content, multiply by 1.3-1.5 (non-English text uses more tokens per character). Worst-case estimate is to over-provision your input estimate by 20%.

Why is output 4-5× more expensive than input?

Output requires the full forward pass of the model for every generated token; input is processed in parallel with much higher throughput. Hardware utilization on output is approximately 4-5× more compute-intensive per token than input. This is why prompt caching is so impactful — it shifts cost from 'expensive output you can't avoid' to 'discounted input you'd be sending anyway.'

Should I switch to a cheaper model to save money?

Test the quality first at small volume. The calculator's 'cheapest alternative' surface is purely cost-driven; the cheapest model at your volume might not produce acceptable quality for your task. Run a 100-request A/B at the cheaper model, compare quality manually, and only migrate if quality holds. Frontier-tier models (Opus, GPT-5, Gemini 2.5 Pro) cost 5-15× more than fast-tier (Haiku, mini, Flash) — but on hard tasks (long-form reasoning, creative work, complex tool use), the quality gap is real.

What if my volume is unpredictable?

Run the calculator at three volumes — your typical day, your worst case, and 10× your worst case. Monthly cost scales linearly with volume, so the answer is intuitive. If you're at 10K req/day and 100K req/day produces a budget you can't sustain, build rate limiting BEFORE you scale, not after the bill arrives.

Does this include free-tier credits?

No — the calculator computes raw list-rate cost. Most providers offer free credits for evaluation ($5-100 typically). Subtract those manually for your first month's budget. Once volume scales, the free tier is rounding error.

What about volume discounts?

Anthropic, OpenAI, and Google all offer custom enterprise pricing above ~$50K/month spend, typically 10-25% off list rates. Enterprise prepay-commit deals can reduce by another 5-15%. The calculator uses list rates; if you're hitting enterprise scale, the actual numbers will be 10-30% better.

How do reasoning models (o1, etc.) bill differently?

Reasoning models charge for 'thinking tokens' (the model's internal reasoning trace) as output tokens, even though the user doesn't see them. Output tokens for reasoning tasks are often 5-10× higher than the visible response. The calculator uses standard rates; for reasoning-model usage, multiply your 'visible output' by 5-10× to estimate true output token volume.

Can I save scenarios for cost comparisons?

Yes — click Save to store named scenarios. Recommended: save 'baseline' (your current model + no caching), 'optimized' (same model + caching + batch), 'cheaper alternative' (cheapest model from the matrix at same caching). Compare the three saved scenarios to make a defensible cost-optimization plan.

CareerFree · No signup · 30K+/month

LLM API Token Cost Calculator — OpenAI, Anthropic, Google (2026 Pricing)

Pinpoint your monthly LLM API spend across OpenAI, Anthropic, and Google. Models prompt caching, batch API discounts, and surfaces the cheapest model at your volume.

Instant result
Private — nothing saved
Works on any device
AI insight included

Reviewed by CalcBold EditorialLast verified May 4, 2026Methodology

Provider

Pick the provider; model dropdown filters to that provider's models.

Model

Per-1M-token rate (input / output) shown in the label. Pick a current 2026 model.

Input tokens per request

Average prompt + system + history per request. 1 token ≈ 0.75 words.

Output tokens per request

Average response length per request.

Requests per day

Total requests/day across all users (your live volume, not theoretical max).

Prompt caching

Anthropic 90% off input · OpenAI 50% off · Google 75% off. Worth it for repeated-prefix workloads.

% of input tokens cached

Realistic cache-hit rate for your workload. 70% is typical; chatbots with stable system prompts hit 85-90%.

Batch API

Batch API processes within 24 hours. Good for backfills, evaluations, async workloads.

Provider × model cost matrix

Adjust your volume and discount toggles. The cheapest cell gets a trophy badge — useful when shopping for the lowest-cost model that'll handle your task.

Input tokens/reqOutput tokens/reqRequests/dayCache hit %

Prompt cachingBatch API (50% off)

Anthropic cheapest

$1,800

of 3 Claude models

OpenAI cheapest

$315

of 4 GPT models

Google cheapest

$105

of 3 Gemini models

AnthropicNo caching · Real-time

Model	Monthly cost	vs cheapest	Status
Claude Opus 4.7	$33,750	+$33,645 (321.4×)	321.4× pricier
Claude Sonnet 4.6	$6,750	+$6,645 (64.3×)	64.3× pricier
Claude Haiku 4.5	$1,800	+$1,695 (17.1×)	17.1× pricier

OpenAINo caching · Real-time

Model	Monthly cost	vs cheapest	Status
GPT-5	$3,375	+$3,270 (32.1×)	32.1× pricier
GPT-4o	$5,250	+$5,145 (50.0×)	50.0× pricier
GPT-4o-mini	$315	+$210 (3.0×)	3.0× pricier
o1 (reasoning)	$31,500	+$31,395 (300.0×)	300.0× pricier

GoogleNo caching · Real-time

Model	Monthly cost	vs cheapest	Status
Gemini 2.5 Pro	$3,375	+$3,270 (32.1×)	32.1× pricier
Gemini 2.5 Flash	$210	+$105 (2.0×)	2.0× pricier
Gemini 2.5 Flash-Lite	$105	—	Cheapest

The cheapest model gets the trophy. “Competitive” means within 2× of cheapest — usually worth considering if quality is closer to frontier-tier. Migration only makes sense after a quality A/B on your specific task — list price ranking ≠ quality ranking.

Embed builderDrop the Token Cost on your site →Free widget · 3 sizes · custom theme · auto-resizes · no signupGet embed code

What This Calculator Does

The API Token Cost Calculator computes monthly LLM API spend across OpenAI, Anthropic, and Google using public 2026 list pricing. It models the two most-impactful cost levers — prompt caching (60-90% off input tokens for repeated prefixes) and the Batch API (50% off both input and output for async workloads) — and surfaces the cheapest model in the matrix at your specific volume.

It's built for developers planning capacity, picking a provider, or stress-testing a budget before scaling. The output is an honest cost ranking: the cheapest model isn't always the right one (quality varies), but knowing the cost spread helps you decide whether to A/B test a tier down.

The Math

2026 Pricing Snapshot

List rates per 1M tokens. Verify on each provider's pricing page before basing a production budget on these numbers — providers cut rates 2-3 times per year for older models.

Claude Opus 4.7 — $15 input / $75 output · cached input $1.50 (90% off)
Claude Sonnet 4.6 — $3 input / $15 output · cached $0.30
Claude Haiku 4.5 — $0.80 input / $4 output · cached $0.08
GPT-5 — $1.25 input / $10 output · cached $0.625
GPT-4o — $2.50 input / $10 output · cached $1.25
GPT-4o-mini — $0.15 input / $0.60 output · cached $0.075
o1 (reasoning) — $15 input / $60 output · no caching tier
Gemini 2.5 Pro — $1.25 input / $10 output · cached $0.31
Gemini 2.5 Flash — $0.10 input / $0.40 output · cached $0.025
Gemini 2.5 Flash-Lite — $0.05 input / $0.20 output

The Two Most-Impactful Levers

Prompt caching

For workloads with stable prefixes (system prompt, RAG context, few-shot examples), prompt caching is essentially free engineering. Anthropic charges 10% of input rate for cached tokens (90% off); OpenAI charges 50%; Google charges 25%. Enable it for any system prompt > 1,000 tokens called more than 2-3 times. Typical real-world cache-hit rate is 65-85%.

Batch API

Any workload that can wait 24 hours: backfills, periodic evaluations, async data labeling, dataset preprocessing. Batch is 50% off both input AND output across all major providers. Real-time chatbots and customer-facing apps can't use it. For internal tooling and async pipelines, batch saves half the bill with zero quality difference.

Reading the Provider × Model Matrix

The scenario panel shows monthly cost for every model from every provider at YOUR volume. The trophy badge marks the cheapest cell. Three columns to watch:

Monthly cost. Computed at your input/output token volumes, requests/day, and discount toggles.
vs cheapest. Dollar gap from the trophy. Useful for sizing migration upside.
Status.“Competitive” means within 2× of cheapest — usually worth considering if quality is closer to frontier-tier. “3× pricier” usually means you're overpaying for capability you don't need.

Common Cost-Optimization Mistakes

Leaving caching off when prefix is stable. Highest-ROI optimization with effectively zero engineering cost. If your system prompt or context exceeds 1K tokens and you call it >2-3 times, you're leaving 60-80% input cost on the table.
Migrating models without quality A/B. The calculator surfaces the cost option, not the quality option. Always run a 100-request A/B at the cheaper model before migrating; quality varies dramatically by task type.
Ignoring output cost.Output is 4-5× more expensive than input across providers. Caching can't reduce it. Better prompting (request shorter responses, structured output) is the only lever for output cost.
Underestimating reasoning-model output. Reasoning models (o1) bill internal “thinking tokens” as output, even though you don't see them. For reasoning tasks, multiply visible-output by 5-10× to estimate real output volume.
Not asking about enterprise pricing at scale. All three providers offer custom enterprise rates above ~$50K/month spend, typically 10-25% off list. Prepay-commit deals can reduce another 5-15%. If you're hitting enterprise scale, the actual numbers will be 10-30% better than this calculator's list-price estimate.

How This Calculator Differs From Provider Pricing Pages

Each provider's pricing page shows their own rates only. This calculator runs all three side-by-side at YOUR volume, with discount toggles applied uniformly — making the cross-provider comparison apples-to-apples. The provider ranking shifts dramatically by volume and caching setup, so a single-provider page can't answer “am I on the right provider?”

Related Tools

Freelance Rate Calculator — if you're billing clients for AI-powered work, factor token cost into your hourly rate.
True Hourly Rate Calculator — add API costs as a work expense to compute your real after-cost hourly rate.
Currency Converter — for non-USD billing scenarios.

How to Read the Verdict

Two numbers matter: the cheapest-model pick in the matrix and the monthly cost spread between top and bottom. The cheapest model isn’t always the right one — quality varies — but knowing the spread tells you whether an A/B test down a tier is worth running.

Spread > 5× across providers. A/B test the cheapest model on real prompts. If quality holds within ±5% on your eval set, switch — the savings compound at scale.
Repeated prefixes > 80% of input. Always enable prompt caching. 60-90% savings on input tokens with zero quality impact — the most under-utilized cost lever.
Workload is async (eval, batch eval, classification). Use the Batch API. 50% off both directions, 24-hr SLA — only unsuitable when latency genuinely matters.
Monthly spend > $5,000. Compare against the GPU rental and self-host calcs — at this volume, alternatives become competitive.

Frequently Asked Questions

The most common questions we get about this calculator — each answer is kept under 60 words so you can scan.

How accurate is this calculator's pricing?
Snapshot of public list prices from each provider's pricing page as of late 2025 / early 2026. Prices change — providers cut rates 2-3 times per year for older models. Always verify on the provider's pricing page (anthropic.com/pricing, openai.com/api/pricing, ai.google.dev/pricing) before basing a production budget on this. The matrix gives you the relative ranking, which changes more slowly.
What is prompt caching and when should I enable it?
Prompt caching lets the provider cache stable parts of your prompt (system prompt, few-shot examples, large RAG context) and charge a much lower rate for cached tokens on subsequent calls. Anthropic charges 10% of input rate for cached tokens (90% off); OpenAI and Google offer similar discounts. Enable it for ANY workload with a repeated prefix > ~1,000 tokens called more than 2-3 times — it pays back immediately with effectively zero engineering cost.
What's a realistic cache-hit rate?
70% is the calculator's default and a reasonable baseline. Workloads with stable system prompts + dynamic user messages typically hit 75-90%. RAG systems with rotating retrieved chunks hit 50-70%. Single-turn API calls with no prefix hit 0% — caching can't help. Measure your actual hit rate with the provider's metrics; the calc lets you plug it in.
When should I use the Batch API?
Any workload that can wait 24 hours: backfills, periodic evaluations, async data labeling, dataset preprocessing. Batch API is 50% off both input and output across all major providers. Real-time chatbots, customer-facing apps, and anything with a latency budget can't use it. For internal tooling and async pipelines, batch saves half the bill with zero quality difference.
How do I estimate input/output tokens accurately?
Rule of thumb: 1 token ≈ 0.75 English words or 4 characters. So a 1,000-word system prompt ≈ 1,300 tokens. For precise counts, use the provider's tokenizer (Anthropic's anthropic.tokenize, OpenAI's tiktoken). For mixed-language content, multiply by 1.3-1.5 (non-English text uses more tokens per character). Worst-case estimate is to over-provision your input estimate by 20%.
Why is output 4-5× more expensive than input?
Output requires the full forward pass of the model for every generated token; input is processed in parallel with much higher throughput. Hardware utilization on output is approximately 4-5× more compute-intensive per token than input. This is why prompt caching is so impactful — it shifts cost from 'expensive output you can't avoid' to 'discounted input you'd be sending anyway.'
Should I switch to a cheaper model to save money?
Test the quality first at small volume. The calculator's 'cheapest alternative' surface is purely cost-driven; the cheapest model at your volume might not produce acceptable quality for your task. Run a 100-request A/B at the cheaper model, compare quality manually, and only migrate if quality holds. Frontier-tier models (Opus, GPT-5, Gemini 2.5 Pro) cost 5-15× more than fast-tier (Haiku, mini, Flash) — but on hard tasks (long-form reasoning, creative work, complex tool use), the quality gap is real.
What if my volume is unpredictable?
Run the calculator at three volumes — your typical day, your worst case, and 10× your worst case. Monthly cost scales linearly with volume, so the answer is intuitive. If you're at 10K req/day and 100K req/day produces a budget you can't sustain, build rate limiting BEFORE you scale, not after the bill arrives.
Does this include free-tier credits?
No — the calculator computes raw list-rate cost. Most providers offer free credits for evaluation ($5-100 typically). Subtract those manually for your first month's budget. Once volume scales, the free tier is rounding error.
What about volume discounts?
Anthropic, OpenAI, and Google all offer custom enterprise pricing above ~$50K/month spend, typically 10-25% off list rates. Enterprise prepay-commit deals can reduce by another 5-15%. The calculator uses list rates; if you're hitting enterprise scale, the actual numbers will be 10-30% better.
How do reasoning models (o1, etc.) bill differently?
Reasoning models charge for 'thinking tokens' (the model's internal reasoning trace) as output tokens, even though the user doesn't see them. Output tokens for reasoning tasks are often 5-10× higher than the visible response. The calculator uses standard rates; for reasoning-model usage, multiply your 'visible output' by 5-10× to estimate true output token volume.
Can I save scenarios for cost comparisons?
Yes — click Save to store named scenarios. Recommended: save 'baseline' (your current model + no caching), 'optimized' (same model + caching + batch), 'cheaper alternative' (cheapest model from the matrix at same caching). Compare the three saved scenarios to make a defensible cost-optimization plan.

API Token Cost Calculator (LLM)

Provider × model cost matrix

You might also need

What This Calculator Does

The Math

2026 Pricing Snapshot

The Two Most-Impactful Levers

Prompt caching

Batch API

Reading the Provider × Model Matrix

Common Cost-Optimization Mistakes

How This Calculator Differs From Provider Pricing Pages

Related Tools

How to Read the Verdict

Frequently Asked Questions