Workloads mapped to model slugs, provider paths, and backup routes.
Route
Workload
Signal
Cost + latency
Change
No app deploy
New capabilities
Send cheap summaries to a small model and hard reasoning to a strong one, so you're not paying frontier prices for routine work or shipping weak output on the work that matters.
Summary
Low-cost route
Routine work moves to cheaper models.
Agent
Stronger model
Harder work keeps a higher-quality route.
Fallback
Backup path
Approved route is ready if primary fails.
Swap the provider or model behind a workload from config, so route logic lives in the gateway instead of being hard-coded into every app and CI pipeline.
Compare cost, latency, error rate, and output behavior per route using your real traffic, then move a workload when the data says so.
Who Concentrate is designed for
LLM routing is the practice of sending each request to the model and provider path chosen for that specific workload, instead of pointing every call at one default provider. A summary, a coding agent, a chat reply, and a data-extraction job have different cost, latency, and quality needs — routing lets each one use the route that fits, and lets you change that route from config when prices or models shift.
Routine, high-volume calls go to cheaper models while harder work keeps a stronger route, so spend tracks the value of the task instead of a single default.
Apps send one request shape to Concentrate. The model and provider behind it change through config, not through edits to every service.
Pair routing with fallbacks so a workload has a backup path ready when its primary provider degrades or fails.
Feature basics
Usage analytics and request logs show cost, latency, and errors per route, so route changes are based on your traffic rather than a vendor benchmark.