Per-request duration plus p50 and p95 signals for routing decisions.
Signal
Duration
Route
Provider
Debug
Logs
Request
1.8s
Duration for a model call.
Provider
Azure
Route behind the latency signal.
Status
200
Separate slow success from failed requests.
New capabilities
See response time per model call in the logs, so a slow AI feature has a number behind it instead of a vague 'it feels laggy.'
Compare latency across providers and models to spot the route adding seconds to a workload, then move it or prepare a fallback.
Pair latency with status, provider, model, and fallback use, so a slow success and a failed retry don't get debugged as the same problem.
Who Concentrate is designed for
Latency tracking records how long each model call took and rolls it up by provider and model. In request routing you can sort by live p50 or p95 latency so chat and agent workloads prefer the fastest healthy path for the model you send.
Find which provider path adds seconds to a user-facing feature.
Compare duration across routes before promoting a backup in fallbacks.
Separate slow successes from failed retries using status and duration together in request logs.
Feature basics
Use measured latency windows in API sorts instead of guessing which provider is fastest.