CONCENTRATE
Teams
For AI Engineers

Quick start, smart routing, and examples.

For Engineering Leadership

Architecture, governance, and analytics.

For Finance & Management

Observability, cost control, and chargebacks.

For Compliance

Guardrails, redaction, auditing, & SSO.

PricingROI Calculator
ModelsDocs

Model Fortress

Compare pricing, context windows, and capabilities across every major LLM provider

Filters

Showing 120 of 120
Claude Opus 4.7
claude-opus-4-7
Anthropic's most capable generally available model. Step-change improvement in agentic coding over Opus 4.6, with a new tokenizer and 1M context window.
Anthropic1.0M ctx|128K max out|$5.00/M in|$25.00/M out
Claude Opus 4.6
claude-opus-4-6
Anthropic's most recent model. Current leader on agentic coding evaluation, Terminal Bench 2.0, and Humanity's Last Exam.
Anthropic200K ctx|128K max out|$5.00/M in|$25.00/M out
Claude Opus 4.5
claude-opus-4-5
Anthropic's former flagship model combining maximum intelligence with practical performance.
Anthropic200K ctx|64K max out|$5.00/M in|$25.00/M out
Claude Sonnet 4.6
claude-sonnet-4-6
Anthropic's latest Sonnet model with hybrid reasoning, matching near-flagship performance at a fraction of the cost.
Anthropic200K ctx|64K max out|$3.00/M in|$15.00/M out
Claude Sonnet 4.5
claude-sonnet-4-5
Anthropic's latest Sonnet model featuring exceptional performance on coding, analysis, and instruction following.
Anthropic200K ctx|64K max out|$3.00/M in|$15.00/M out
Claude Haiku 4.5
claude-haiku-4-5
Anthropic's fastest model with near-frontier intelligence, ideal for high-throughput applications.
Anthropic200K ctx|64K max out|$1.00/M in|$5.00/M out
Claude Opus 4.1
claude-opus-4-1
Anthropic's previous flagship model with maximum intelligence. Legacy model - consider using Opus 4.5 for new projects.
Anthropic200K ctx|32K max out|$15.00/M in|$75.00/M out
Claude Opus 4
claude-opus-4
Anthropic's original Opus 4 model. Legacy model - consider using Opus 4.5 for new projects.
Anthropic200K ctx|32K max out|$15.00/M in|$75.00/M out
Claude Sonnet 4
claude-sonnet-4
Anthropic's balanced model offering excellent performance across coding, analysis, and complex tasks. A great balance of speed, intelligence, and cost.
Anthropic200K ctx|64K max out|$3.00/M in|$15.00/M out
GPT 5.5
gpt-5.5
OpenAI's GPT-5.5 is a frontier model with a 1M+ context window, reasoning, and broad tool support including web search, file search, image generation, code interpreter, hosted shell, computer use, and MCP. Knowledge cutoff December 1, 2025.
OpenAI1.1M ctx|128K max out|$5.00/M in|$30.00/M out
GPT 5.4
gpt-5.4
OpenAI's GPT-5.4 is OpenAI's latest frontier model with a 1M+ context window, improved reasoning with xhigh effort support, and enhanced capabilities for coding, agentic tasks, and computer use.
OpenAI1.1M ctx|128K max out|$2.50/M in|$15.00/M out
GPT 5.4 Mini
gpt-5.4-mini
OpenAI's efficient, cost-optimized variant of GPT-5.4 with strong reasoning and multimodal capabilities at a fraction of the cost.
OpenAI128K ctx|128K max out|$0.75/M in|$4.50/M out
GPT 5.4 Nano
gpt-5.4-nano
OpenAI's most affordable and fastest variant of GPT-5.4, optimized for high-volume, latency-sensitive applications with minimal cost.
OpenAI128K ctx|128K max out|$0.20/M in|$1.25/M out
GPT 5.4 Pro
gpt-5.4-pro
OpenAI's most powerful GPT-5.4 variant with a 1M+ context window, designed for the most demanding reasoning, coding, and agentic tasks with maximum capability.
OpenAI1.1M ctx|128K max out|$30.00/M in|$180.00/M out
GPT 5.2
gpt-5.2
OpenAI's latest and greatest, GPT 5.2 is OpenAI's flagship model for coding and agentic tasks across industries.
OpenAI400K ctx|128K max out|$1.75/M in|$14.00/M out
GPT 5.1
gpt-5.1
OpenAI's previous flagship model for coding and agentic tasks with configurable reasoning and non-reasoning effort.
OpenAI400K ctx|128K max out|$1.25/M in|$10.00/M out
GPT 5
gpt-5
OpenAI's former advanced reasoning model with enhanced problem-solving capabilities, deeper understanding, and improved accuracy across complex tasks.
OpenAI400K ctx|128K max out|$1.25/M in|$10.00/M out
GPT 5 Mini
gpt-5-mini
OpenAI's smaller and faster variant of GPT-5, a more cost-efficient alternative. Great for well-defined tasks and precise prompts.
OpenAI400K ctx|128K max out|$0.25/M in|$2.00/M out
GPT 5 Nano
gpt-5-nano
OpenAI's most cost-effective and efficient model in the GPT-5 series, optimized for speed and affordability. Ideal for straightforward tasks and high-volume applications where latency and cost are critical.
OpenAI400K ctx|128K max out|$0.05/M in|$0.40/M out
OpenAI o1
o1
OpenAI's reasoning model designed to think before responding. Uses chain-of-thought reasoning for complex tasks in science, coding, and math.
OpenAI200K ctx|100K max out|$15.00/M in|$60.00/M out
GPT 4.1
gpt-4.1
OpenAI's improved version of GPT-4, featuring enhanced reasoning capabilities, better contextual understanding, and increased accuracy across a wide range of tasks.
OpenAI1.0M ctx|33K max out|$2.00/M in|$8.00/M out
GPT 4.1 Mini
gpt-4.1-mini
OpenAI's balanced small model with a massive 1M token context window. Offers great performance at low cost, beating GPT-4o in many benchmarks.
OpenAI1.0M ctx|33K max out|$0.40/M in|$1.60/M out
GPT 4o
gpt-4o
OpenAI's multimodal model optimized for speed and efficiency. Capable of processing text, images, and audio with high accuracy at reduced latency and cost compared to GPT-4 Turbo.
OpenAI128K ctx|16K max out|$2.50/M in|$10.00/M out
GPT 4o Mini
gpt-4o-mini
OpenAI's cost-efficient small model. Great for lightweight tasks with fast responses and low cost while maintaining strong capabilities.
OpenAI128K ctx|16K max out|$0.15/M in|$0.60/M out
OpenAI gpt-oss 120B
gpt-oss-120b
OpenAI's GPT-OSS open-weight model designed for powerful reasoning, agentic tasks, and versatile developer use cases.
OpenAI131K ctx|128K max out|$0.00/M in|$0.00/M out
OpenAI gpt-oss 20B
gpt-oss-20b
OpenAI's smaller open-weight model optimized for lower latency and specialized use-cases. Great for edge deployment and faster responses.
OpenAI131K ctx|128K max out|$0.00/M in|$0.00/M out
GPT 5.3 Codex
gpt-5.3-codex
OpenAI's GPT-5.3-Codex is the most capable agentic coding model, combining frontier coding performance with reasoning capabilities. Features mid-task steering and 25% faster inference than GPT-5.2-Codex.
OpenAI400K ctx|128K max out|$1.75/M in|$14.00/M out
GPT 5.2 Codex
gpt-5.2-codex
OpenAI's GPT-5.2-Codex is an upgraded version of GPT-5.2 optimized for agentic coding tasks in Codex or similar environments.
OpenAI400K ctx|128K max out|$1.75/M in|$14.00/M out
GPT 5.1 Codex Max
gpt-5.1-codex-max
A version of GPT-5.1-Codex optimized for long-running tasks.
OpenAI400K ctx|128K max out|$1.25/M in|$10.00/M out
GPT 5.1 Codex Mini
gpt-5.1-codex-mini
Smaller, more cost-effective, less-capable version of GPT-5.1-Codex.
OpenAI400K ctx|128K max out|$0.25/M in|$2.00/M out
OpenAI gpt-oss Safeguard 120B
gpt-oss-safeguard-120b
OpenAI's safety-focused open-weight model with 120B parameters. Designed for content moderation and safe AI applications.
OpenAI128K ctx|128K max out|$0.15/M in|$0.60/M out
OpenAI gpt-oss Safeguard 20B
gpt-oss-safeguard-20b
OpenAI's efficient safety-focused open-weight model with 20B parameters. Lightweight content moderation and safety.
OpenAI128K ctx|128K max out|$0.07/M in|$0.20/M out
Gemini 3.1 Flash Lite Preview
gemini-3.1-flash-lite-preview
Google's lightweight and efficient model in the 3.1 family, optimized for low latency and cost, in preview version.
Google1.0M ctx|66K max out|$0.25/M in|$1.50/M out
Gemini 3.1 Pro Preview
gemini-3.1-pro-preview
Google's latest model, in preview version.
Google1.0M ctx|66K max out|$2.00/M in|$12.00/M out
Gemini 3 Flash Preview
gemini-3-flash-preview
Google's latest model with the Flash line's focus on latency, efficiency, and cost, in preview version.
Google1.0M ctx|66K max out|$0.50/M in|$3.00/M out
Gemini 2.5 Pro
gemini-2.5-pro
Google's most capable model for complex reasoning tasks. Features a 1M token context window and strong performance across benchmarks.
Google1.0M ctx|66K max out|$1.25/M in|$10.00/M out
Gemini 2.5 Flash
gemini-2.5-flash
Google's fast and cost-effective model with a 1M token context window. Best for high-volume, low-latency tasks and agentic use cases.
Google1.0M ctx|66K max out|$0.30/M in|$2.50/M out
Gemini 2.0 Flash
gemini-2.0-flash
Google's fast and efficient model with a 1M token context window. Optimized for speed and cost-effectiveness with multimodal support.
Google1.0M ctx|8K max out|$0.15/M in|$0.60/M out
Gemma 3 12B
gemma-3-12b
Google's lightweight open model well-suited for text generation and image understanding. Supports LoRA fine-tuning with 128K context.
Google128K ctx|8K max out|$0.09/M in|$0.29/M out
Gemma 3 4B IT
gemma-3-4b
Google's lightweight 4B parameter Gemma 3 model. Efficient for simple tasks with vision capabilities.
Google128K ctx|8K max out|$0.04/M in|$0.08/M out
Gemma 3 27B
gemma-3-27b
Google's largest Gemma 3 model with 27B parameters. Strong performance across reasoning, coding, and multilingual tasks with vision support.
Google128K ctx|8K max out|$0.23/M in|$0.38/M out
Gemma 4 26B A4B
gemma-4-26b-a4b
Efficient MoE variant of Gemma 4 from Google DeepMind. Multimodal (text + image input), text output.
Google262K ctx|16K max out|$0.07/M in|$0.34/M out
Grok Code Fast
grok-code-fast-1
xAI's specialized coding model optimized for fast code generation, completion, and explanation. 256K context window.
xAI256K ctx|131K max out|$0.20/M in|$1.50/M out
Grok 4.20 Multi-Agent
grok-4.20-multi-agent-0309
xAI's multi-agent model with 2M token context window. Optimized for multi-agent workflows with reasoning capabilities. No client-side or custom tools: Client-side tools (function calling) and custom tools are not currently supported by the multi-agent model variant.
xAI2.0M ctx|131K max out|$2.00/M in|$6.00/M out
Grok 4.20 Reasoning
grok-4.20-0309-reasoning
xAI's Grok 4.20 reasoning model with 2M token context window. Supports reasoning, function calling, structured outputs, and vision.
xAI2.0M ctx|131K max out|$2.00/M in|$6.00/M out
Grok 4.20 Non Reasoning
grok-4.20-0309-non-reasoning
xAI's Grok 4.20 non-reasoning model with 2M token context window. Optimized for speed without reasoning overhead, supports function calling, structured outputs, and vision.
xAI2.0M ctx|131K max out|$2.00/M in|$6.00/M out
Grok 4.1 Fast Reasoning
grok-4-1-fast-reasoning
xAI's latest fast reasoning model with massive 2M token context window. Optimized for speed with reasoning capabilities enabled.
xAI2.0M ctx|131K max out|$0.20/M in|$0.50/M out
Grok 4.1 Fast
grok-4-1-fast-non-reasoning
xAI's latest fast model with massive 2M token context window. Optimized for speed without reasoning overhead for straightforward tasks.
xAI2.0M ctx|131K max out|$0.20/M in|$0.50/M out
Grok 4 (0709)
grok-4-0709
xAI's Grok 4 model snapshot from July 2025. Premium pricing for maximum capability with 256K context.
xAI256K ctx|131K max out|$3.00/M in|$15.00/M out
Grok 4
grok-4
xAI's latest and greatest flagship model, offering unparalleled performance in natural language, math, and reasoning - the perfect jack of all trades.
xAI256K ctx|256K max out|$3.00/M in|$15.00/M out
Grok 4 Fast Reasoning
grok-4-fast-reasoning
xAI's fast Grok 4 model with reasoning capabilities and massive 2M token context window. Great balance of speed and intelligence.
xAI2.0M ctx|131K max out|$0.20/M in|$0.50/M out
Grok 4 Fast
grok-4-fast-non-reasoning
xAI's fast Grok 4 model without reasoning overhead. Massive 2M context window for processing large documents quickly.
xAI2.0M ctx|131K max out|$0.20/M in|$0.50/M out
Grok 3
grok-3
xAI's former flagship reasoning model with strong performance in natural language, math, and coding. Features a 131K context window.
xAI131K ctx|131K max out|$3.00/M in|$15.00/M out
Grok 3 Mini
grok-3-mini
xAI's cost-efficient model with reasoning capabilities. Surprisingly outperforms Grok 3 in many benchmarks while costing 90% less.
xAI131K ctx|131K max out|$0.30/M in|$0.50/M out
DeepSeek R1
deepseek-r1
DeepSeek's reasoning model trained with reinforcement learning. Excels at math, code, and complex reasoning tasks.
DeepSeek128K ctx|33K max out|$1.35/M in|$5.40/M out
DeepSeek R1 Distill 32B
deepseek-r1-distill-32b
DeepSeek's reasoning model distilled from R1 based on Qwen2.5. Outperforms OpenAI o1-mini across various benchmarks with state-of-the-art dense model results.
DeepSeek80K ctx|16K max out|$0.50/M in|$4.88/M out
DeepSeek V3.2
deepseek-v3-2
DeepSeek's latest general-purpose model with improved reasoning, coding, and instruction-following capabilities.
DeepSeek164K ctx|66K max out|$0.26/M in|$0.38/M out
DeepSeek V4 Flash
deepseek-v4-flash
DeepSeek's fast, cost-efficient general-purpose model with 1M context window. Supports tool calling and thinking mode.
DeepSeek1.0M ctx|384K max out|$0.14/M in|$0.28/M out
DeepSeek V4 Pro
deepseek-v4-pro
DeepSeek's flagship reasoning model with 1M context window. Supports tool calling, extended thinking, and high-quality generation.
DeepSeek1.0M ctx|384K max out|$1.74/M in|$3.48/M out
Qwen3 30B
qwen3-30b
Alibaba's Qwen3 model with groundbreaking advancements in reasoning and instruction-following. Excellent for complex tasks and coding.
Alibaba Cloud33K ctx|8K max out|$0.05/M in|$0.34/M out
QwQ 32B
qwq-32b
Alibaba's QwQ reasoning model optimized for analytical tasks. Strong performance on math, coding, and logical reasoning benchmarks.
Alibaba Cloud24K ctx|16K max out|$0.66/M in|$1.00/M out
Qwen3 32B
qwen3-32b
Alibaba's Qwen3 dense 32B model. Strong performance across reasoning, math, and coding tasks.
Alibaba Cloud32K ctx|8K max out|$0.15/M in|$0.60/M out
Qwen3 Coder 30B A3B
qwen3-coder-30b-a3b
Alibaba's Qwen3 coding-focused model with 30B total / 3B active parameters using mixture-of-experts architecture.
Alibaba Cloud256K ctx|66K max out|$0.15/M in|$0.60/M out
Qwen3 Coder Next
qwen3-coder-next
Alibaba's latest Qwen3 coding model. Cutting-edge code generation and understanding capabilities.
Alibaba Cloud256K ctx|66K max out|$0.50/M in|$1.20/M out
Qwen3 Next 80B A3B
qwen3-next-80b-a3b
Alibaba's latest Qwen3 model with 80B total / 3B active parameters using mixture-of-experts. Strong general-purpose performance.
Alibaba Cloud128K ctx|8K max out|$0.15/M in|$1.20/M out
Qwen3 VL 235B A22B
qwen3-vl-235b-a22b
Alibaba's Qwen3 vision-language model with 235B total / 22B active parameters. Supports text and image inputs.
Alibaba Cloud128K ctx|8K max out|$0.53/M in|$2.66/M out
Kimi K2 Thinking
kimi-k2-thinking
Moonshot AI's trillion-parameter MoE reasoning model (32B activated). Excels at multi-step reasoning with 200+ sequential tool calls. Supports function calling and extended thinking.
Moonshot AI256K ctx|66K max out|$0.60/M in|$2.50/M out
Kimi K2.5
kimi-k2-5
Moonshot AI's native multimodal agentic model built on K2. Excels at visual coding, reasoning, and self-directed agent swarm with up to 100 sub-agents.
Moonshot AI256K ctx|33K max out|$0.60/M in|$3.00/M out
Kimi K2.6
kimi-k2-6
Moonshot AI's next-gen agentic model built on K2. Long-horizon coding, proactive autonomous execution, and swarm-based task orchestration.
Moonshot AI262K ctx|16K max out|$0.75/M in|$3.50/M out
MiniMax M2
minimax-m2
MiniMax's flagship agentic language model with 200K context window. Supports function calling and reasoning.
MiniMax205K ctx|8K max out|$0.30/M in|$1.20/M out
MiniMax M2.1
minimax-m2-1
MiniMax's lightweight MoE model optimized for coding, agentic workflows, and modern application development. 10B activated parameters with strong multilingual code generation.
MiniMax205K ctx|8K max out|$0.30/M in|$1.20/M out
MiniMax M2.1 Highspeed
minimax-m2-1-highspeed
MiniMax M2.1 optimized for speed at ~100 tokens/second with 200K context window.
MiniMax205K ctx|8K max out|$0.30/M in|$1.20/M out
MiniMax M2.5
minimax-m2-5
MiniMax's latest flagship model with 200K context window, ~60 tokens/second. Supports function calling and thinking.
MiniMax205K ctx|8K max out|$0.30/M in|$1.20/M out
MiniMax M2.5 Highspeed
minimax-m2-5-highspeed
MiniMax M2.5 optimized for speed at ~100 tokens/second with 200K context window.
MiniMax205K ctx|8K max out|$0.60/M in|$2.40/M out
MiniMax M2.7
minimax-m2-7
MiniMax's latest flagship model with 200K context window and 131K max output. Advanced agentic capabilities with multi-agent collaboration, strong coding, and tool calling.
MiniMax205K ctx|131K max out|$0.30/M in|$1.20/M out
MiniMax M2.7 Highspeed
minimax-m2-7-highspeed
MiniMax M2.7 optimized for speed with 200K context window and 131K max output.
MiniMax205K ctx|131K max out|$0.60/M in|$2.40/M out
GLM-5
glm-5
ZhipuAI's most capable reasoning model with 128K context window. Supports tool calling, web search, and thinking.
z.ai200K ctx|128K max out|$1.00/M in|$3.20/M out
GLM-5.1
glm-5.1
Z-AI's next-generation flagship model for agentic engineering. Stronger coding capabilities and state-of-the-art performance on SWE-Bench Pro.
z.ai203K ctx|16K max out|$1.05/M in|$3.50/M out
GLM-4.7
glm-4.7
ZhipuAI's advanced reasoning model with 128K context window. Supports tool calling, web search, and thinking.
z.ai200K ctx|128K max out|$0.60/M in|$2.20/M out
GLM-4.6
glm-4.6
ZhipuAI's reasoning model with 128K context window. Supports tool calling, web search, and thinking.
z.ai200K ctx|128K max out|$0.60/M in|$2.20/M out
GLM-4.5
glm-4.5
ZhipuAI's efficient reasoning model with 128K context window. Supports tool calling, web search, and thinking.
z.ai128K ctx|96K max out|$0.60/M in|$2.20/M out
GLM-4.6v
glm-4.6v
ZhipuAI's vision-language model with 128K context window. Supports image inputs and thinking.
z.ai131K ctx|33K max out|$0.30/M in|$0.90/M out
GLM-4.5v
glm-4.5v
ZhipuAI's efficient vision-language model with 128K context window. Supports image inputs and thinking.
z.ai131K ctx|16K max out|$0.60/M in|$1.80/M out
GLM-4.7 Flash
glm-4.7-flash
ZhipuAI's fast and efficient model variant. Optimized for speed while maintaining strong language capabilities.
z.ai200K ctx|128K max out|$0.07/M in|$0.40/M out
Llama 4 Scout
llama-4-scout
Meta's latest Llama 4 model with 17B active parameters using mixture-of-experts architecture.
Meta131K ctx|8K max out|$0.17/M in|$0.66/M out
Llama 3.3 70B Instruct
llama-3.3-70b-instruct
Meta's powerful 70B parameter model quantized to FP8 for fast inference. Excellent for complex reasoning and multilingual tasks.
Meta128K ctx|8K max out|$0.29/M in|$0.72/M out
Llama 3.2 3B Instruct
llama-3.2-3b-instruct
Meta's compact 3B parameter model optimized for edge deployment and multilingual dialogue. Great balance of speed and capability.
Meta128K ctx|8K max out|$0.05/M in|$0.15/M out
Llama 3.2 1B Instruct
llama-3.2-1b-instruct
Meta's compact and efficient 1 billion parameter model designed for on-device and edge deployment. Optimized for instruction following with strong performance despite its small size, ideal for resource-constrained environments.
Meta128K ctx|8K max out|$0.03/M in|$0.10/M out
Llama 3.1 8B Instruct
llama-3.1-8b-instruct
Meta's efficient 8B parameter model optimized for multilingual dialogue. Fast inference with great performance for everyday tasks.
Meta128K ctx|8K max out|$0.04/M in|$0.22/M out
Llama 4 Maverick
llama-4-maverick
Meta's Llama 4 Maverick with 17B active parameters using mixture-of-experts. Excels at coding, reasoning, and multilingual tasks.
Meta1.0M ctx|8K max out|$0.24/M in|$0.97/M out
Llama 3.2 11B Instruct
llama-3.2-11b-instruct
Meta's 11B multimodal model supporting text and image inputs. Optimized for visual reasoning and image understanding tasks.
Meta128K ctx|8K max out|$0.16/M in|$0.16/M out
Llama 3.2 90B Instruct
llama-3.2-90b-instruct
Meta's largest multimodal model with 90B parameters. Supports text and image inputs with strong reasoning capabilities.
Meta128K ctx|8K max out|$0.72/M in|$0.72/M out
Llama 3.1 70B Instruct
llama-3.1-70b-instruct
Meta's 70B parameter model with 128K context. Strong performance on reasoning, coding, and multilingual tasks.
Meta128K ctx|8K max out|$0.72/M in|$0.72/M out
Llama 3 70B Instruct
llama-3-70b-instruct
Meta's original Llama 3 70B model optimized for dialogue. Strong general-purpose performance across a wide range of tasks.
Meta8K ctx|2K max out|$2.65/M in|$3.50/M out
Llama 3 8B Instruct
llama-3-8b-instruct
Meta's efficient Llama 3 8B model optimized for dialogue. Fast inference suitable for lightweight tasks.
Meta8K ctx|2K max out|$0.30/M in|$0.60/M out
Codestral
codestral
Mistral's specialized model for code generation. Optimized for coding tasks including code completion, generation, and explanation.
Mistral128K ctx|32K max out|$0.30/M in|$0.90/M out
Devstral 2
devstral-2
Mistral's frontier code agents model for solving software engineering tasks. Excels at using tools to explore codebases, editing multiple files, and powering software engineering agents. 256K context.
Mistral256K ctx|32K max out|$0.40/M in|$2.00/M out
Magistral Medium 1.2
magistral-medium-1.2
Mistral's frontier-class multimodal reasoning model.
Mistral128K ctx|128K max out|$2.00/M in|$5.00/M out
Magistral Small 1.2
magistral-small-1.2
Mistral's reasoning-focused small model with vision capabilities. Optimized for step-by-step reasoning tasks.
Mistral128K ctx|128K max out|$0.50/M in|$1.50/M out
Mistral Large 3
mistral-large-3
Mistral's flagship 675B parameter model with state-of-the-art reasoning, coding, and multilingual capabilities with vision support.
Mistral256K ctx|32K max out|$0.50/M in|$1.50/M out
Mistral Medium 3
mistral-medium-3
Mistral's frontier-class multimodal model released May 2025.
Mistral128K ctx|32K max out|$0.40/M in|$2.00/M out
Mistral Medium 3.1
mistral-medium-3.1
Multimodal model from Mistral, released August 2025. Improved tone and performance.
Mistral128K ctx|32K max out|$0.40/M in|$2.00/M out
Mistral Nemo
mistral-nemo
Mistral's best multilingual open source model released July 2024.
Mistral128K ctx|32K max out|$0.15/M in|$0.15/M out
Mistral Small 3.1
mistral-small-3.1
Mistral's efficient 24B model optimized for simple tasks with low latency and 128K context. Great for classification, customer support, text generation, and multimodal tasks.
Mistral128K ctx|8K max out|$0.35/M in|$0.56/M out
Mistral Small
mistral-small-3.2
Mistral's efficient 24B model optimized for simple tasks with low latency and 128K context. Great for classification, customer support, text generation, and multimodal tasks.
Mistral128K ctx|32K max out|$0.10/M in|$0.30/M out
Ministral 3 3B
ministral-3-3b
Mistral's ultra-efficient 3B parameter model with vision support. Designed for edge and low-latency applications.
Mistral256K ctx|32K max out|$0.10/M in|$0.10/M out
Ministral 3 8B
ministral-3-8b
Mistral's efficient 8B parameter model with vision support. Good balance of capability and speed for moderate tasks.
Mistral256K ctx|32K max out|$0.15/M in|$0.15/M out
Ministral 3 14B
ministral-3-14b
Mistral's mid-range 14B parameter model with vision support. Enhanced reasoning over smaller Ministrals.
Mistral256K ctx|32K max out|$0.20/M in|$0.20/M out
Palmyra X4
palmyra-x4
Writer's enterprise-grade model optimized for business content generation, analysis, and transformation.
Writer128K ctx|8K max out|$2.50/M in|$10.00/M out
Palmyra X5
palmyra-x5
Writer's most capable enterprise model with enhanced reasoning, analysis, and content generation capabilities.
Writer128K ctx|8K max out|$0.60/M in|$6.00/M out
Jamba 1.5 Large
jamba-1-5-large
AI21's Jamba 1.5 Large model with hybrid SSM-Transformer architecture. Excels at long-context understanding and generation.
AI21 Labs256K ctx|4K max out|$2.00/M in|$8.00/M out
Jamba 1.5 Mini
jamba-1-5-mini
AI21's efficient Jamba 1.5 Mini model with hybrid SSM-Transformer architecture. Fast and cost-effective for everyday tasks.
AI21 Labs256K ctx|4K max out|$0.20/M in|$0.40/M out
Command A
command-a
Command A is Cohere's most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases.
Cohere256K ctx|8K max out|$2.50/M in|$10.00/M out
Command A Vision
command-a-vision
Cohere's first multimodal model capable of understanding and interpreting visual data alongside text.
Cohere128K ctx|8K max out|$2.50/M in|$10.00/M out
IBM Granite Micro
ibm-granite-micro
IBM's ultra-efficient micro model. Small but mighty - perfect for simple tasks requiring minimal latency and cost.
IBM131K ctx|4K max out|$0.02/M in|$0.11/M out
Amazon Nova Lite
nova-lite
Amazon's multimodal model for processing images, video, and text. Can analyze multiple images with 300K context.
Amazon300K ctx|5K max out|$0.06/M in|$0.24/M out
Amazon Nova Micro
nova-micro
Amazon's fastest text-only model optimized for speed and cost. Ideal for text-based tasks requiring low latency with 128K context window.
Amazon128K ctx|5K max out|$0.04/M in|$0.14/M out
Amazon Nova Pro
nova-pro
Amazon's highly capable multimodal model balancing accuracy, speed, and cost. Processes text and images with 300K context.
Amazon300K ctx|5K max out|$0.80/M in|$3.20/M out
Amazon Nova Premier
nova-premier
Amazon's most capable multimodal model for complex reasoning tasks. Processes text and images with up to 1M context window.
Amazon1.0M ctx|20K max out|$2.50/M in|$12.50/M out
Amazon Nova 2 Lite
nova-2-lite
Amazon's second-generation lightweight multimodal model. Processes text and images with improved performance over Nova Lite.
Amazon256K ctx|5K max out|$0.30/M in|$2.50/M out

Access all models through one API

Stop juggling multiple provider SDKs. Concentrate gives you a single endpoint for every model listed above, with built-in guardrails, analytics, and spend management.

Get Started Free

Frequently Asked Questions

What is the Concentrate.ai Model Fortress?

The Concentrate.ai Model Fortress is a live catalog of every LLM accessible through the Concentrate.ai API. It shows provider-specific pricing (input and output cost per million tokens), context window sizes, and capabilities like function calling, vision, streaming, and JSON mode for each model across every available provider.

How often is the Model Fortress updated?

The Model Fortress pulls live data from the Concentrate.ai model catalog API and revalidates every hour. Pricing and availability reflect the current state of each provider.

Which AI providers are included?

The Model Fortress includes models from OpenAI, Anthropic, Google, Meta (Llama), DeepSeek, Mistral, Cohere, xAI, and many more. Each model shows all providers that offer it, so you can compare the same model across different providers and pick the cheapest or best-fit option.

Contact

130 E 59th St
17th floor
New York, NY 10022
1201 N. Market Street
Suite 200
Wilmington, DE 19801

Teams

  • AI Engineers
  • Tech Teams
  • Finance
  • Compliance

Platform

  • Pricing
  • Model Fortress
  • Documentation
  • Status

Legal

  • Privacy Policy
  • Terms of Service
  • Data Processing Addendum
  • Acceptable Use Policy

© 2026 Concentrate AI. All rights reserved.

Sign In