AI model pricing table
All 277 tracked LLMs sortable by input price, output price, context window, tier, and use case.
AI pricing calculator
Tokenscost tracks 277 models across 36 providers, with input, output, context window, and best-fit use case data exposed in crawler-readable HTML and JSON.
A crawler-readable homepage sample of low-cost models. The complete 277-row table is on /pricing-table.
| Model | Provider | Input $/M | Output $/M | Context | Tier | Best for |
|---|---|---|---|---|---|---|
| Hunyuan-Lite | Tencent | $0 | $0 | 256,000 | Budget | Free Tencent Cloud tier for high-volume chat and classification |
| Gemma 2 9B | $0.030 | $0.060 | 8,192 | Open Source | Smaller open-weight Gemma 2 for edge and on-device | |
| Nova Micro | Amazon | $0.035 | $0.140 | 128,000 | Budget | Cheapest text-only Bedrock model for high-volume workloads |
| Command R7B | Cohere | $0.037 | $0.150 | 128,000 | Budget | Ultra-cheap, low-latency high-volume tasks |
| Ministral 3B | Mistral | $0.040 | $0.040 | 128,000 | Budget | Ultra-cheap 3B model for classification and routing |
| Phi-4 Mini | Microsoft | $0.040 | $0.100 | 128,000 | Budget | Compact 3.8B Phi-4 sibling for low-latency reasoning |
| Llama 3 8B | Meta | $0.050 | $0.080 | 8,192 | Open Source | Previous-gen 8B open-weight for chat and edge |
| Llama 3.1 8B | Meta | $0.050 | $0.080 | 128,000 | Open Source | Open-weight 8B for low-latency chat and edge |
| Llama 3.1 8B Instant (Groq) | Groq | $0.050 | $0.080 | 128,000 | Budget | Ultra-fast LPU-hosted 8B for sub-second responses |
| Gemma 3 12B | $0.050 | $0.100 | 128,000 | Open Source | Mid-size open-weight Gemma 3 for self-hosted chat and reasoning | |
| Qwen3 Turbo | Qwen | $0.050 | $0.200 | 131,072 | Budget | High-throughput cost-efficient inference |
| GLM-4.5 Flash | Zhipu | $0.050 | $0.200 | 128,000 | Budget | High-volume classification and retrieval |
| GPT-5.4 Nano | OpenAI | $0.050 | $0.400 | 128,000 | Budget | Classification, routing, simple tasks |
| Qwen3 Flash | Qwen | $0.050 | $0.400 | 128,000 | Budget | Ultra-cheap classification, chat, and high-volume tasks |
| GPT-5 Nano | OpenAI | $0.050 | $0.400 | 400,000 | Budget | Fast, low-cost OpenAI model for simple or high-volume tasks |
| Nova Canvas | Amazon | $0.060 | $0.060 | 0 | Mid-tier | Image generation and editing on Amazon Bedrock |
| Hunyuan-Standard | Tencent | $0.060 | $0.130 | 32,000 | Budget | Default Tencent Cloud model for everyday chat and RAG |
| Hy3 Lite | Tencent | $0.060 | $0.180 | 128,000 | Budget | Ultra-fast MoE for chat, classification, and high-volume apps |
| MiMo Lite | Xiaomi | $0.060 | $0.180 | 128,000 | Budget | Ultra-affordable entry model for high-volume apps |
| Nova Lite | Amazon | $0.060 | $0.240 | 300,000 | Budget | Affordable multimodal Bedrock model — text, vision, video understanding |
All 277 tracked LLMs sortable by input price, output price, context window, tier, and use case.
Count tokens in your prompts and estimate API costs before sending them to OpenAI, Anthropic, Google, and others.
Estimate multi-step agent loop costs including tool calls, retries, and model routing.
Estimate savings from asynchronous batch endpoints and prompt-caching discounts.
GPT-4.1, GPT-5, o3, o4-mini pricing per million input and output tokens with context windows.
Claude Sonnet, Opus, and Haiku pricing per million tokens including extended context and caching.
Gemini 2.5 Pro, Flash, and Flash-Lite pricing across input, output, and long-context tiers.
Profiles for every tracked AI API provider with pricing, strengths, and best-fit workloads.
Compare GPU cloud instances across providers by GPU, VRAM, count, price, tier, and region.
Compare frameworks, SDKs, and orchestrators by per-task cost, scaffold tokens, and license.
Compare metered API costs against ChatGPT, Claude, and Gemini subscription plans.
Side-by-side model comparison for real application workloads.
Fetch a static CC-BY pricing dataset without executing the React app.