- OpenAI-compatible
- Native client (/inference)
Prerequisites
A Concentrate AI account with an active API key
sk-cn-v1-.Quick Start for Claude Code users
If you use Claude Code, you can install a skill that walks through this migration interactively. It collapses the base URL, stripstensorzero:: model namespacing and body params, decomposes your tensorzero.toml functions and variants, maps model slugs, and generates a verification script. Drop the skill into your ~/.claude/skills/ directory:
/migrate-tensorzero. Claude will load the skill and run the steps.
Step 1: Update Your Environment Variables
TensorZero is self-hosted: provider credentials live in yourtensorzero.toml, and gateway auth (if enabled) uses TENSORZERO_API_KEY. All of that collapses to a single Concentrate key:
api_key sent by the client; it uses credentials from its own config. Concentrate requires a real sk-cn-v1-... key on every request, so make sure your client now sends one (not a placeholder).Step 2: Update Your Client
Collapse the gateway base URL (both the/openai/v1 and /inference paths) onto https://api.concentrate.ai/v1, and drop the tensorzero:: namespacing and body params. If you used the native tensorzero client, swap to the OpenAI SDK. Concentrate ships no dedicated SDK because the OpenAI-compatible shape covers every endpoint.
Step 3: Remove TensorZero-Specific Headers and Body Params
TensorZero keeps almost all of its distinctiveness intensorzero::-prefixed body params, not headers. None of them, nor TensorZero’s custom headers, carry over to Concentrate, so they should come out; they’re dead weight and mislead future readers. Expand the tables below for the mapping.
Request header mapping
Request header mapping
| TensorZero header | Concentrate replacement |
|---|---|
Authorization: Bearer sk-t0-... | Standard Authorization: Bearer sk-cn-v1-... (ignored on TensorZero’s OpenAI endpoint; required on Concentrate) |
tensorzero-otlp-traces-extra-header-* / -attribute-* / -resource-* (inject headers, span, and resource attributes into OTLP trace exports) | No equivalent. Concentrate records its own per-request telemetry; handle bespoke OTLP export in your tracing layer |
Request body param mapping (`tensorzero::*`)
Request body param mapping (`tensorzero::*`)
tensorzero::-prefixed body fields on the OpenAI endpoint (via extra_body), or as top-level fields (episode_id, variant_name, …) on /inference.| TensorZero body param | Concentrate replacement |
|---|---|
tensorzero::episode_id | Drop. Link calls with previous_response_id (see Step 4) |
tensorzero::variant_name (pin a variant) | Drop. Pin model: "provider/model-id"; express A/B and fallback as routing.* body params (see Step 4) |
tensorzero::cache_options (enabled, max_age_s) | Provider-native prompt caching (Anthropic + AWS Bedrock). Seeded per key by default; set the seed with the prompt_cache_key body param. No gateway-stored cache and no max_age_s control |
tensorzero::credentials (dynamic per-request provider keys) | Drop. Concentrate owns provider credentials at no per-token BYOK fee |
tensorzero::tags (key/value metadata) | Drop. Attribution comes from the key/team/org hierarchy; analytics roll up automatically. No per-request tag dimension |
tensorzero::namespace (experimentation config selector) | Drop. No per-request experimentation namespace |
tensorzero::params (override inference parameters) | Pass standard parameters (temperature, max_tokens, etc.) directly in the request body |
tensorzero::extra_body (JSON-Pointer edits to the provider request) | Pass the underlying provider fields directly in the request body |
tensorzero::extra_headers (inject provider request headers) | Not applicable. Concentrate manages the upstream request |
tensorzero::provider_tools (provider built-in tools) | Use the standard tools field; web search is normalized across providers on the Responses API |
tensorzero::dryrun (run without storing) | No equivalent. Disable request logging at the key level via Zero Data Retention |
tensorzero::include_raw_response, tensorzero::include_raw_usage | Drop. Concentrate returns a normalized response; per-request detail is in the dashboard |
tensorzero::deny_unknown_fields | Drop. No equivalent toggle |
Response field mapping
Response field mapping
| TensorZero response field | Concentrate equivalent |
|---|---|
episode_id | No body field. Use previous_response_id / the response id on the Responses API to link related requests |
tensorzero_cost | No body field. Per-request cost is recorded in the dashboard |
tensorzero_raw_response, tensorzero_raw_usage, tensorzero_extra_content | No body fields. Concentrate returns a normalized OpenAI-shaped response; reasoning and usage detail surface through the standard schema and the dashboard |
| Request / trace id | X-Request-Id (Concentrate returns its own per-request id, surfaced in dashboard logs) |
| Rate-limit headers | Standard X-RateLimit-* headers. See Errors for 429 semantics |
Step 4: Decompose Functions, Variants, and Episodes
TensorZero’s defining concept is the function / variant / episode model intensorzero.toml: you call a named function, the gateway samples a variant (built-in A/B testing and fallbacks), and related inferences group into an episode. Concentrate has none of these primitives. Each behavior is either on by default or a body param. Expand the table for the mapping.
Function / variant / episode mapping
Function / variant / episode mapping
| TensorZero concept | Concentrate equivalent |
|---|---|
Function (tensorzero::function_name::my_fn, named entry in config) | Call the model directly. Move the function’s prompt template into your application code or system prompt |
| Variant (a concrete model + params behind a function) | Pin a model with model: "provider/model-id", and pass params in the request body |
| Variant sampling for A/B testing | No built-in experiment splitter. Choose the variant in application code, or issue separate keys per arm for clean attribution |
Variant fallbacks (a configured model with fallbacks, tensorzero::model_name) | routing.model.fallbacks / routing.provider.fallbacks (ordered) body params, or model: "auto" |
Episode (episode_id grouping multi-step workflows) | previous_response_id on the Responses API links related requests into a server-managed conversation tree |
Feedback API (/feedback, metrics attributed to an episode) | No equivalent today |
cache_options | Provider-native prompt caching; prompt_cache_key sets the seed |
| Provider fallbacks inside a configured model | routing.model.fallbacks / routing.provider.fallbacks (ordered). Failover is automatic on any provider error |
Step 5: Update Model Identifiers
Concentrate accepts model strings in two forms:- Bare slug, e.g.
gpt-4o,claude-haiku-4-5,auto. Routing picks a provider. provider/model-id, e.g.bedrock/claude-haiku-4-5,openai/gpt-4o. Pins the request to a specific provider.
tensorzero:: namespacing. Strip the tensorzero:: and model_name:: prefixes, and convert the provider::model double-colon to a provider/model slash. A function_name reference has no model string, so use the model its winning variant resolved to (see Step 4).
| TensorZero | Concentrate AI |
|---|---|
tensorzero::model_name::openai::gpt-4o | openai/gpt-4o |
tensorzero::model_name::anthropic::claude-... | anthropic/claude-... |
openai::gpt-4o (native model_name) | openai/gpt-4o |
tensorzero::function_name::my_fn | The model its variant resolved to (e.g. openai/gpt-4o), or model: "auto" |
gcp_vertex_anthropic) already encodes the serving provider, so you just rename the prefix. For most popular names author and provider match (openai, anthropic, mistral), but they diverge whenever a model is hosted by something other than its author:
| Author | Provider serving the request | Concentrate slug |
|---|---|---|
Anthropic (claude-haiku-4-5) | Anthropic | anthropic/claude-haiku-4-5 |
| Anthropic (same model, different host) | AWS Bedrock | bedrock/claude-haiku-4-5 |
Google (gemini-3.5-flash) | Google AI Studio | ai-studio/gemini-3.5-flash |
Meta (llama-3-8b-instruct) | AWS Bedrock | bedrock/llama-3-8b-instruct |
provider/ prefix when you specifically want to pin to one host (for ZDR compliance, contractual reasons, or latency in a specific region). To replace a configured model with provider fallbacks, use routing.model.fallbacks / routing.provider.fallbacks or model: "auto" (see Step 4).
For the authoritative list of supported provider/model-id pairs, call GET /v1/models or browse the Model Fortress.
Step 6: Reconnect Observability
TensorZero bundles observability, optimization, evaluations, and experimentation alongside the gateway (backed by ClickHouse). Concentrate’s dashboard covers the gateway-side surfaces; the experimentation and optimization tooling has no direct equivalent.| TensorZero surface | Concentrate equivalent |
|---|---|
| Inference observability (per-request logs, cost, latency) | Per-request logs at concentrate.ai |
tensorzero_cost / usage tracking | Org / team / developer / key spend rollups |
tags-based filtering | Per-key / team / org rollups. No per-request tag dimension |
| Episodes (multi-step grouping) | previous_response_id on the Responses API links related requests into a stateful tree |
OTLP trace export (tensorzero-otlp-traces-extra-*) | No OTLP passthrough. Concentrate records its own per-request telemetry; handle bespoke export in your tracing layer |
| Feedback API / evaluations / experiments | No equivalent today. Keep your evaluation and experimentation tooling separate from the gateway |
Rate limits (api_key_public_id scope) | Per-key rate and spend limits in the dashboard |
Exporting your TensorZero history
Because TensorZero is self-hosted, your history lives in your own ClickHouse database and does not import into Concentrate. If your migration is compliance- or audit-driven, snapshot that data before tearing down the gateway.Step 7 (Optional): Adopt the Responses API
If you used TensorZero’s native/inference endpoint or its episodes, Concentrate’s native Responses API is the closest successor: streaming, tool calling, structured output, multi-modal input, and web search through one normalized shape across every provider, with previous_response_id replacing episode_id-grouped sessions.
Why migrate to Concentrate
How it works
How it works
tensorzero.toml, exposing several surfaces on one host. Migrating retires the process and config file and collapses every surface onto Concentrate’s single hosted base URL https://api.concentrate.ai/v1.| TensorZero surface | TensorZero path (default host http://localhost:3000) |
|---|---|
| Native inference | /inference |
| OpenAI-compatible | /openai/v1/chat/completions |
| Batch inference | /batch_inference |
| Feedback | /feedback |
/inference endpoint is Responses-style (typed input, multi-step episodes). Its closest one-to-one target is Concentrate’s Responses API, reachable from the standard OpenAI SDK or any HTTP client.No gateway to run
No gateway to run
tensorzero.toml and provider credentials to maintain. Concentrate is hosted, with no gateway to operate, no config to version, and credentials managed for you.Team-scale spend management
Team-scale spend management
tags tagging to maintain.Feature-aware resiliency
Feature-aware resiliency
routing.model.fallbacks, routing.provider.fallbacks), Concentrate’s routing layer ships:- Uptime gate. Providers whose per-feature success rate drops below 90% are skipped.
- Feature degradation. If no provider supports the full requested feature set (e.g.
json_schema), the request is downgraded tojson_objector text instead of failing. - Cache-affinity routing. When multiple providers can serve a request, the one where your actor already has cached tokens is preferred.
tensorzero.toml to author, name, or version.Strategy-driven auto routing
Strategy-driven auto routing
model: "auto" accepts an explicit optimization target via routing.model.sort: cost, latency, or performance (default). See Auto Routing. This replaces hand-authoring a configured model with provider fallbacks in tensorzero.toml.Native Responses and Messages APIs
Native Responses and Messages APIs
/inference endpoint, Concentrate exposes a first-class Responses API. Alongside it you also get OpenAI Chat Completions compatibility and an Anthropic-compatible Messages API.Managed provider credentials by default
Managed provider credentials by default
tensorzero.toml (and TensorZero’s per-request credentials param) can be retired after migrating, with no per-token BYOK fee.Troubleshooting
Model not found
Model not found
tensorzero:: and model_name:: prefixes and convert the provider::model double-colon to provider/model (e.g. tensorzero::model_name::openai::gpt-4o → openai/gpt-4o). Bare slugs (gpt-4o, claude-haiku-4-5) work too. If you’re using a provider/ prefix and getting a miss, double-check the prefix is a provider (e.g. bedrock, azure, ai-studio) and not just the author (e.g. meta, google). Call GET /v1/models for the authoritative list.Invalid API key error
Invalid API key error
sk-cn-v1-. TensorZero’s OpenAI-compatible endpoint ignored the client key, so your code may have been sending a placeholder (or a sk-t0-... gateway key). Concentrate requires a real sk-cn-v1-... key on every request. Verify the value in your dashboard and confirm there are no extra spaces or quotes.Requests succeed but nothing shows up in the Concentrate dashboard
Requests succeed but nothing shows up in the Concentrate dashboard
https://api.concentrate.ai/v1, not localhost:3000. If the client is still pointed at your self-hosted gateway it is logging against your own ClickHouse store, not Concentrate.My functions / variants stopped working
My functions / variants stopped working
tensorzero::function_name::... and tensorzero::variant_name are no-ops on Concentrate; there is no function/variant config. Pin a model with model: "provider/model-id", express fallbacks via routing.model.fallbacks / routing.provider.fallbacks, or use model: "auto". Move function prompt templates into application code. Variant A/B sampling and the /feedback loop have no per-request equivalent, so handle the split in application code.My episodes / feedback stopped working
My episodes / feedback stopped working
episode_id body field or /feedback endpoint. Link related requests with previous_response_id on the Responses API. Evaluation feedback that fed TensorZero’s optimization stack should stay in your own tooling.Cache hit rate dropped after migrating
Cache hit rate dropped after migrating
max_age_s control, so TensorZero’s cache_options does not carry over directly. Caches are seeded per API key by default; pass prompt_cache_key in the request body if you want to set the seed explicitly.Connection errors
Connection errors
https://api.concentrate.ai/v1 (no /api segment, no /openai/v1 or /inference suffix, no localhost:3000). Test the connection manually:Next Steps
API Reference
Available Models
Auto Routing
Get Support
Feedback
If you hit anything that didn’t translate cleanly (especially around functions, variants, episodes, the feedback/experimentation loop,cache_options, or OTLP trace export), email support@concentrate.ai. The capability gaps called out above are tracked, and migration friction reports directly shape what we ship next.