LLM API Token Cost Calculator — OpenAI, Anthropic, Google (2026 Pricing)
Pinpoint your monthly LLM API spend across OpenAI, Anthropic, and Google. Models prompt caching, batch API discounts, and surfaces the cheapest model at your volume.
- Instant result
- Private — nothing saved
- Works on any device
- AI insight included
API Token Cost Calculator (LLM)
Provider × model cost matrix
Adjust your volume and discount toggles. The cheapest cell gets a trophy badge — useful when shopping for the lowest-cost model that'll handle your task.
| Model | Monthly cost | vs cheapest | Status |
|---|---|---|---|
| Claude Opus 4.7 | $33,750 | +$33,645 (321.4×) | 321.4× pricier |
| Claude Sonnet 4.6 | $6,750 | +$6,645 (64.3×) | 64.3× pricier |
| Claude Haiku 4.5 | $1,800 | +$1,695 (17.1×) | 17.1× pricier |
| Model | Monthly cost | vs cheapest | Status |
|---|---|---|---|
| GPT-5 | $3,375 | +$3,270 (32.1×) | 32.1× pricier |
| GPT-4o | $5,250 | +$5,145 (50.0×) | 50.0× pricier |
| GPT-4o-mini | $315 | +$210 (3.0×) | 3.0× pricier |
| o1 (reasoning) | $31,500 | +$31,395 (300.0×) | 300.0× pricier |
| Model | Monthly cost | vs cheapest | Status |
|---|---|---|---|
| Gemini 2.5 Pro | $3,375 | +$3,270 (32.1×) | 32.1× pricier |
| Gemini 2.5 Flash | $210 | +$105 (2.0×) | 2.0× pricier |
| Gemini 2.5 Flash-Lite | $105 | — | Cheapest |
The cheapest model gets the trophy. “Competitive” means within 2× of cheapest — usually worth considering if quality is closer to frontier-tier. Migration only makes sense after a quality A/B on your specific task — list price ranking ≠ quality ranking.
You might also need
What This Calculator Does
The API Token Cost Calculator computes monthly LLM API spend across OpenAI, Anthropic, and Google using public 2026 list pricing. It models the two most-impactful cost levers — prompt caching (60-90% off input tokens for repeated prefixes) and the Batch API (50% off both input and output for async workloads) — and surfaces the cheapest model in the matrix at your specific volume.
It's built for developers planning capacity, picking a provider, or stress-testing a budget before scaling. The output is an honest cost ranking: the cheapest model isn't always the right one (quality varies), but knowing the cost spread helps you decide whether to A/B test a tier down.
The Math
2026 Pricing Snapshot
List rates per 1M tokens. Verify on each provider's pricing page before basing a production budget on these numbers — providers cut rates 2-3 times per year for older models.
- Claude Opus 4.7 — $15 input / $75 output · cached input $1.50 (90% off)
- Claude Sonnet 4.6 — $3 input / $15 output · cached $0.30
- Claude Haiku 4.5 — $0.80 input / $4 output · cached $0.08
- GPT-5 — $1.25 input / $10 output · cached $0.625
- GPT-4o — $2.50 input / $10 output · cached $1.25
- GPT-4o-mini — $0.15 input / $0.60 output · cached $0.075
- o1 (reasoning) — $15 input / $60 output · no caching tier
- Gemini 2.5 Pro — $1.25 input / $10 output · cached $0.31
- Gemini 2.5 Flash — $0.10 input / $0.40 output · cached $0.025
- Gemini 2.5 Flash-Lite — $0.05 input / $0.20 output
The Two Most-Impactful Levers
Prompt caching
For workloads with stable prefixes (system prompt, RAG context, few-shot examples), prompt caching is essentially free engineering. Anthropic charges 10% of input rate for cached tokens (90% off); OpenAI charges 50%; Google charges 25%. Enable it for any system prompt > 1,000 tokens called more than 2-3 times. Typical real-world cache-hit rate is 65-85%.
Batch API
Any workload that can wait 24 hours: backfills, periodic evaluations, async data labeling, dataset preprocessing. Batch is 50% off both input AND output across all major providers. Real-time chatbots and customer-facing apps can't use it. For internal tooling and async pipelines, batch saves half the bill with zero quality difference.
Reading the Provider × Model Matrix
The scenario panel shows monthly cost for every model from every provider at YOUR volume. The trophy badge marks the cheapest cell. Three columns to watch:
- Monthly cost. Computed at your input/output token volumes, requests/day, and discount toggles.
- vs cheapest. Dollar gap from the trophy. Useful for sizing migration upside.
- Status.“Competitive” means within 2× of cheapest — usually worth considering if quality is closer to frontier-tier. “3× pricier” usually means you're overpaying for capability you don't need.
Common Cost-Optimization Mistakes
- Leaving caching off when prefix is stable. Highest-ROI optimization with effectively zero engineering cost. If your system prompt or context exceeds 1K tokens and you call it >2-3 times, you're leaving 60-80% input cost on the table.
- Migrating models without quality A/B. The calculator surfaces the cost option, not the quality option. Always run a 100-request A/B at the cheaper model before migrating; quality varies dramatically by task type.
- Ignoring output cost.Output is 4-5× more expensive than input across providers. Caching can't reduce it. Better prompting (request shorter responses, structured output) is the only lever for output cost.
- Underestimating reasoning-model output. Reasoning models (o1) bill internal “thinking tokens” as output, even though you don't see them. For reasoning tasks, multiply visible-output by 5-10× to estimate real output volume.
- Not asking about enterprise pricing at scale. All three providers offer custom enterprise rates above ~$50K/month spend, typically 10-25% off list. Prepay-commit deals can reduce another 5-15%. If you're hitting enterprise scale, the actual numbers will be 10-30% better than this calculator's list-price estimate.
How This Calculator Differs From Provider Pricing Pages
Each provider's pricing page shows their own rates only. This calculator runs all three side-by-side at YOUR volume, with discount toggles applied uniformly — making the cross-provider comparison apples-to-apples. The provider ranking shifts dramatically by volume and caching setup, so a single-provider page can't answer “am I on the right provider?”
Related Tools
- Freelance Rate Calculator — if you're billing clients for AI-powered work, factor token cost into your hourly rate.
- True Hourly Rate Calculator — add API costs as a work expense to compute your real after-cost hourly rate.
- Currency Converter — for non-USD billing scenarios.
How to Read the Verdict
Two numbers matter: the cheapest-model pick in the matrix and the monthly cost spread between top and bottom. The cheapest model isn’t always the right one — quality varies — but knowing the spread tells you whether an A/B test down a tier is worth running.
- Spread > 5× across providers. A/B test the cheapest model on real prompts. If quality holds within ±5% on your eval set, switch — the savings compound at scale.
- Repeated prefixes > 80% of input. Always enable prompt caching. 60-90% savings on input tokens with zero quality impact — the most under-utilized cost lever.
- Workload is async (eval, batch eval, classification). Use the Batch API. 50% off both directions, 24-hr SLA — only unsuitable when latency genuinely matters.
- Monthly spend > $5,000. Compare against the GPU rental and self-host calcs — at this volume, alternatives become competitive.
Frequently Asked Questions
The most common questions we get about this calculator — each answer is kept under 60 words so you can scan.
How accurate is this calculator's pricing?
Snapshot of public list prices from each provider's pricing page as of late 2025 / early 2026. Prices change — providers cut rates 2-3 times per year for older models. Always verify on the provider's pricing page (anthropic.com/pricing, openai.com/api/pricing, ai.google.dev/pricing) before basing a production budget on this. The matrix gives you the relative ranking, which changes more slowly.What is prompt caching and when should I enable it?
Prompt caching lets the provider cache stable parts of your prompt (system prompt, few-shot examples, large RAG context) and charge a much lower rate for cached tokens on subsequent calls. Anthropic charges 10% of input rate for cached tokens (90% off); OpenAI and Google offer similar discounts. Enable it for ANY workload with a repeated prefix > ~1,000 tokens called more than 2-3 times — it pays back immediately with effectively zero engineering cost.What's a realistic cache-hit rate?
70% is the calculator's default and a reasonable baseline. Workloads with stable system prompts + dynamic user messages typically hit 75-90%. RAG systems with rotating retrieved chunks hit 50-70%. Single-turn API calls with no prefix hit 0% — caching can't help. Measure your actual hit rate with the provider's metrics; the calc lets you plug it in.When should I use the Batch API?
Any workload that can wait 24 hours: backfills, periodic evaluations, async data labeling, dataset preprocessing. Batch API is 50% off both input and output across all major providers. Real-time chatbots, customer-facing apps, and anything with a latency budget can't use it. For internal tooling and async pipelines, batch saves half the bill with zero quality difference.How do I estimate input/output tokens accurately?
Rule of thumb: 1 token ≈ 0.75 English words or 4 characters. So a 1,000-word system prompt ≈ 1,300 tokens. For precise counts, use the provider's tokenizer (Anthropic's anthropic.tokenize, OpenAI's tiktoken). For mixed-language content, multiply by 1.3-1.5 (non-English text uses more tokens per character). Worst-case estimate is to over-provision your input estimate by 20%.Why is output 4-5× more expensive than input?
Output requires the full forward pass of the model for every generated token; input is processed in parallel with much higher throughput. Hardware utilization on output is approximately 4-5× more compute-intensive per token than input. This is why prompt caching is so impactful — it shifts cost from 'expensive output you can't avoid' to 'discounted input you'd be sending anyway.'Should I switch to a cheaper model to save money?
Test the quality first at small volume. The calculator's 'cheapest alternative' surface is purely cost-driven; the cheapest model at your volume might not produce acceptable quality for your task. Run a 100-request A/B at the cheaper model, compare quality manually, and only migrate if quality holds. Frontier-tier models (Opus, GPT-5, Gemini 2.5 Pro) cost 5-15× more than fast-tier (Haiku, mini, Flash) — but on hard tasks (long-form reasoning, creative work, complex tool use), the quality gap is real.What if my volume is unpredictable?
Run the calculator at three volumes — your typical day, your worst case, and 10× your worst case. Monthly cost scales linearly with volume, so the answer is intuitive. If you're at 10K req/day and 100K req/day produces a budget you can't sustain, build rate limiting BEFORE you scale, not after the bill arrives.Does this include free-tier credits?
No — the calculator computes raw list-rate cost. Most providers offer free credits for evaluation ($5-100 typically). Subtract those manually for your first month's budget. Once volume scales, the free tier is rounding error.What about volume discounts?
Anthropic, OpenAI, and Google all offer custom enterprise pricing above ~$50K/month spend, typically 10-25% off list rates. Enterprise prepay-commit deals can reduce by another 5-15%. The calculator uses list rates; if you're hitting enterprise scale, the actual numbers will be 10-30% better.How do reasoning models (o1, etc.) bill differently?
Reasoning models charge for 'thinking tokens' (the model's internal reasoning trace) as output tokens, even though the user doesn't see them. Output tokens for reasoning tasks are often 5-10× higher than the visible response. The calculator uses standard rates; for reasoning-model usage, multiply your 'visible output' by 5-10× to estimate true output token volume.Can I save scenarios for cost comparisons?
Yes — click Save to store named scenarios. Recommended: save 'baseline' (your current model + no caching), 'optimized' (same model + caching + batch), 'cheaper alternative' (cheapest model from the matrix at same caching). Compare the three saved scenarios to make a defensible cost-optimization plan.