CONCENTRATE
Pricing
ModelsDocsRequest a Demo

Request Routing

Route requests by cost, latency, and use case

Pick the right model for every workload — support, agents, chat, and batch jobs do not need the same route. Sort by price or live latency, use a fallback or built-in retry when a provider errors or hits a rate limit, and change routes in Concentrate without redeploying every app.

View modelsRead routing docs
Diagram showing support, code, chat, and agent workloads routed by cost, latency, and use case to model providers with fallbacks
Routing plan

Pick a route, add allowed models, and select which providers a key or team can use.

Sort by

Cost / latency

Fallbacks

Model + provider

Failover

Automatic

Sort

Cost

Order providers by price and send the request to the cheapest healthy route.

Primary

claude-haiku-4-5

Send a model slug to pin the model for this workload.

Fallbacks

gpt-5.5

Try the next model in the list when the primary route fails.

Failover

On error / rate limit

Retry on the next provider when one errors or is over its token limit.

New capabilities

What your team gains with Concentrate

01

Sort by cost

Order providers and models by price so routine work runs on the cheapest route that meets the request's needs.

02

Sort by latency

Pick the provider path from live latency metrics, including p50 and p95, measured over a recent window you set.

03

Match the model to the job

Use Model Fortress to compare capabilities and pricing, then route each workload to the model that fits — summaries, agents, chat, and extraction do not need the same slug.

04

Automatic failover

When a provider returns an error or hits a rate limit, the request retries on the next provider or model in the fallback chain.

05

Model and provider fallbacks

Set an ordered list of backup models, or limit a request to the providers you approve.

06

Feature-aware routing

Route only to providers that support what the request uses: streaming, tools, JSON schema, reasoning, and images.

Who Concentrate is designed for

For teams that want to use the right model for every task

Support bots, coding agents, internal tools, and chat do not need the same model or the same provider. Routing lets you select the model that suits the job.

Customer-facing apps

Keep a fallback model ready so an error or rate limit on one provider does not reach your users. See reliability for the full failover plan.

Internal tools

Send routine summaries and classification to cheaper models, and keep stronger models for the harder tasks. Track the savings in spend management.

Latency-sensitive features

Sort by p50 or p95 latency over a recent window so chat and agents use the fastest healthy provider for the model.

Rate-limit pressure

When a model or provider is over its token rate limit, routing skips it and tries the next route in the chain.

Request Routing basics

Frequently asked questions

How does Concentrate route by cost or latency?
You choose how to route in the request: pick a model slug, limit the request to a group of allowed models, or sort providers by price or live latency. Browse Model Fortress to see which models support tools, vision, streaming, and other capabilities, then sort further by latency, cost, or context window for the workload you are running.
What happens when a provider fails or is rate limited?
Most teams send a model slug and rely on built-in retry and failover across the providers that offer that model — you do not have to configure a separate fallback chain in every case. When you want an explicit backup, set model or provider fallbacks in the request. Concentrate skips providers that are over a rate limit or missing a feature the call needs, such as tools or streaming.
Can I pin a specific model or provider path?
Yes. Send a model slug like claude-haiku-4-5 to pin the model, or provider/model like anthropic/claude-haiku-4-5 to pin the exact provider behind it.
Do I set routing in the dashboard or in the request?
Routing is set per request in the API body: the model, sort, fallbacks, and the providers a request may use. That keeps route changes out of your app's provider code, and request logs show the model, provider, duration, and cost for each call.
CONCENTRATE

One API for every major LLM provider — routing, spend, logs, and controls in one place.

New York

130 E 59th St, 17th floor

New York, NY 10022

Wilmington

1201 N. Market Street, Suite 200

Wilmington, DE 19801

LLM Gateway
  • LLM Gateway
  • Request Routing
  • Usage Monitoring
  • Spend Management
  • Data Security
  • Access Controls
Teams
  • AI Engineering
  • Engineering Leadership
  • Finance & Operations
  • Security & Compliance
Integrations
  • All Integrations
  • Migration Guides
Platform
  • Pricing
  • Model Fortress
  • Enterprise
  • Documentation
  • Status
Legal
  • Privacy Policy
  • Terms of Service
  • Data Processing Addendum
  • Acceptable Use Policy
Features
  • Universal API Keys
  • Spend Tracking
  • Token Allocation
  • Usage Analytics
  • Request Logs
  • Alerts
  • Data Redaction
  • Zero Data Retention
  • Audit Logs

LLM Gateway

  • LLM Gateway
  • Request Routing
  • Usage Monitoring
  • Spend Management
  • Data Security
  • Access Controls

Teams

  • AI Engineering
  • Engineering Leadership
  • Finance & Operations
  • Security & Compliance

Integrations

  • All Integrations
  • Migration Guides

Platform

  • Pricing
  • Model Fortress
  • Enterprise
  • Documentation
  • Status

Legal

  • Privacy Policy
  • Terms of Service
  • Data Processing Addendum
  • Acceptable Use Policy

Features

  • Universal API Keys
  • Spend Tracking
  • Token Allocation
  • Usage Analytics
  • Request Logs
  • Alerts
  • Data Redaction
  • Zero Data Retention
  • Audit Logs

Offices

New York

130 E 59th St, 17th floor

New York, NY 10022

Wilmington

1201 N. Market Street, Suite 200

Wilmington, DE 19801

© 2026 Concentrate AI. All rights reserved.

CONCENTRATE
Log In
Log In