Limit
Input + output
Scope
Global + model
Policy
Inherited
Input and output ceilings at org, team, and key scope.
Global
Default limit
Baseline rate limit for all models.
Model
Override
Specific limit for a model route.
Key
Capped
Child key follows parent ceiling.
New capabilities
Set global and per-model limits on input and output, so a single workload can't flood a provider or run up cost with runaway request volume.
Cap child scopes from organization and team settings, so limits are inherited by new keys instead of configured one at a time.
Keep one key from consuming more traffic than intended, protecting both your spend and your shared provider rate limits.
Who Concentrate is designed for
Rate limits cap how much input and output traffic a scope can send, globally and per model. When a key hits a provider rate limit during routing, Concentrate can skip that path and try the next healthy route — but org-level limits stop one workload from consuming the whole quota in the first place.
Set ceilings at org or team scope so new keys inherit policy automatically.
Cap a single key before it exhausts shared provider quotas or your prepaid balance.
Tighten limits on expensive or scarce models without throttling every workload equally.
Pair limits with request routing so over-limit providers are skipped in the failover chain.
Feature basics