Skip to main content
# Before
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/openai/v1",
    api_key="not-used",  # ignored: the gateway uses creds from tensorzero.toml
)

response = client.chat.completions.create(
    model="tensorzero::model_name::openai::gpt-4o",
    messages=[{"role": "user", "content": "Say hello in one word"}],
)

# After
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.concentrate.ai/v1",
    api_key=os.environ["CONCENTRATE_API_KEY"],
)

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Say hello in one word"}],
)

Prerequisites

1

A Concentrate AI account with an active API key

Sign up or log in at concentrate.ai and create an API key. Your key should start with sk-cn-v1-.
2

An existing TensorZero integration

This guide assumes you are calling a self-hosted TensorZero gateway from the native tensorzero client (POST /inference), the OpenAI SDK pointed at /openai/v1, fetch, requests, or another HTTP client.

Quick Start for Claude Code users

If you use Claude Code, you can install a skill that walks through this migration interactively. It collapses the base URL, strips tensorzero:: model namespacing and body params, decomposes your tensorzero.toml functions and variants, maps model slugs, and generates a verification script. Drop the skill into your ~/.claude/skills/ directory:
mkdir -p ~/.claude/skills/migrate-tensorzero && \
  curl -fsSL https://concentrate.ai/scripts/migrate-tensorzero.md \
  -o ~/.claude/skills/migrate-tensorzero/SKILL.md
Then start a Claude Code session in your project and ask it to “migrate from TensorZero to Concentrate” or run /migrate-tensorzero. Claude will load the skill and run the steps.

Step 1: Update Your Environment Variables

TensorZero is self-hosted: provider credentials live in your tensorzero.toml, and gateway auth (if enabled) uses TENSORZERO_API_KEY. All of that collapses to a single Concentrate key:
# Before
export TENSORZERO_API_KEY="sk-t0-..."   # gateway auth, if enabled
export OPENAI_API_KEY="sk-..."          # provider creds read by tensorzero.toml
export ANTHROPIC_API_KEY="sk-ant-..."
export BASE_URL="http://localhost:3000/openai/v1"

# After
export CONCENTRATE_API_KEY="sk-cn-v1-..."
export BASE_URL="https://api.concentrate.ai/v1"
Concentrate is hosted with managed credentials, so there’s no gateway process or tensorzero.toml to maintain, and no OPENAI_API_KEY, ANTHROPIC_API_KEY, or AWS credentials to keep. Comment them out (don’t delete) until you’ve verified the migration end-to-end, then remove them.
TensorZero’s OpenAI-compatible endpoint ignores any api_key sent by the client; it uses credentials from its own config. Concentrate requires a real sk-cn-v1-... key on every request, so make sure your client now sends one (not a placeholder).

Step 2: Update Your Client

Collapse the gateway base URL (both the /openai/v1 and /inference paths) onto https://api.concentrate.ai/v1, and drop the tensorzero:: namespacing and body params. If you used the native tensorzero client, swap to the OpenAI SDK. Concentrate ships no dedicated SDK because the OpenAI-compatible shape covers every endpoint.
# Before
from tensorzero import TensorZeroGateway

with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    response = client.inference(
        model_name="openai::gpt-4o",
        input={"messages": [{"role": "user", "content": "Say hello in one word"}]},
    )

# After
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.concentrate.ai/v1",
    api_key=os.environ["CONCENTRATE_API_KEY"],
)

response = client.chat.completions.create(
    model="anthropic/claude-opus-4-6",
    messages=[{"role": "user", "content": "Say hello in one word"}],
)
print(response.choices[0].message.content)

Step 3: Remove TensorZero-Specific Headers and Body Params

TensorZero keeps almost all of its distinctiveness in tensorzero::-prefixed body params, not headers. None of them, nor TensorZero’s custom headers, carry over to Concentrate, so they should come out; they’re dead weight and mislead future readers. Expand the tables below for the mapping.
TensorZero headerConcentrate replacement
Authorization: Bearer sk-t0-...Standard Authorization: Bearer sk-cn-v1-... (ignored on TensorZero’s OpenAI endpoint; required on Concentrate)
tensorzero-otlp-traces-extra-header-* / -attribute-* / -resource-* (inject headers, span, and resource attributes into OTLP trace exports)No equivalent. Concentrate records its own per-request telemetry; handle bespoke OTLP export in your tracing layer
Sent as tensorzero::-prefixed body fields on the OpenAI endpoint (via extra_body), or as top-level fields (episode_id, variant_name, …) on /inference.
TensorZero body paramConcentrate replacement
tensorzero::episode_idDrop. Link calls with previous_response_id (see Step 4)
tensorzero::variant_name (pin a variant)Drop. Pin model: "provider/model-id"; express A/B and fallback as routing.* body params (see Step 4)
tensorzero::cache_options (enabled, max_age_s)Provider-native prompt caching (Anthropic + AWS Bedrock). Seeded per key by default; set the seed with the prompt_cache_key body param. No gateway-stored cache and no max_age_s control
tensorzero::credentials (dynamic per-request provider keys)Drop. Concentrate owns provider credentials at no per-token BYOK fee
tensorzero::tags (key/value metadata)Drop. Attribution comes from the key/team/org hierarchy; analytics roll up automatically. No per-request tag dimension
tensorzero::namespace (experimentation config selector)Drop. No per-request experimentation namespace
tensorzero::params (override inference parameters)Pass standard parameters (temperature, max_tokens, etc.) directly in the request body
tensorzero::extra_body (JSON-Pointer edits to the provider request)Pass the underlying provider fields directly in the request body
tensorzero::extra_headers (inject provider request headers)Not applicable. Concentrate manages the upstream request
tensorzero::provider_tools (provider built-in tools)Use the standard tools field; web search is normalized across providers on the Responses API
tensorzero::dryrun (run without storing)No equivalent. Disable request logging at the key level via Zero Data Retention
tensorzero::include_raw_response, tensorzero::include_raw_usageDrop. Concentrate returns a normalized response; per-request detail is in the dashboard
tensorzero::deny_unknown_fieldsDrop. No equivalent toggle
TensorZero response fieldConcentrate equivalent
episode_idNo body field. Use previous_response_id / the response id on the Responses API to link related requests
tensorzero_costNo body field. Per-request cost is recorded in the dashboard
tensorzero_raw_response, tensorzero_raw_usage, tensorzero_extra_contentNo body fields. Concentrate returns a normalized OpenAI-shaped response; reasoning and usage detail surface through the standard schema and the dashboard
Request / trace idX-Request-Id (Concentrate returns its own per-request id, surfaced in dashboard logs)
Rate-limit headersStandard X-RateLimit-* headers. See Errors for 429 semantics

Step 4: Decompose Functions, Variants, and Episodes

TensorZero’s defining concept is the function / variant / episode model in tensorzero.toml: you call a named function, the gateway samples a variant (built-in A/B testing and fallbacks), and related inferences group into an episode. Concentrate has none of these primitives. Each behavior is either on by default or a body param. Expand the table for the mapping.
TensorZero conceptConcentrate equivalent
Function (tensorzero::function_name::my_fn, named entry in config)Call the model directly. Move the function’s prompt template into your application code or system prompt
Variant (a concrete model + params behind a function)Pin a model with model: "provider/model-id", and pass params in the request body
Variant sampling for A/B testingNo built-in experiment splitter. Choose the variant in application code, or issue separate keys per arm for clean attribution
Variant fallbacks (a configured model with fallbacks, tensorzero::model_name)routing.model.fallbacks / routing.provider.fallbacks (ordered) body params, or model: "auto"
Episode (episode_id grouping multi-step workflows)previous_response_id on the Responses API links related requests into a server-managed conversation tree
Feedback API (/feedback, metrics attributed to an episode)No equivalent today
cache_optionsProvider-native prompt caching; prompt_cache_key sets the seed
Provider fallbacks inside a configured modelrouting.model.fallbacks / routing.provider.fallbacks (ordered). Failover is automatic on any provider error

Step 5: Update Model Identifiers

Concentrate accepts model strings in two forms:
  • Bare slug, e.g. gpt-4o, claude-haiku-4-5, auto. Routing picks a provider.
  • provider/model-id, e.g. bedrock/claude-haiku-4-5, openai/gpt-4o. Pins the request to a specific provider.
TensorZero wraps model strings in tensorzero:: namespacing. Strip the tensorzero:: and model_name:: prefixes, and convert the provider::model double-colon to a provider/model slash. A function_name reference has no model string, so use the model its winning variant resolved to (see Step 4).
TensorZeroConcentrate AI
tensorzero::model_name::openai::gpt-4oopenai/gpt-4o
tensorzero::model_name::anthropic::claude-...anthropic/claude-...
openai::gpt-4o (native model_name)openai/gpt-4o
tensorzero::function_name::my_fnThe model its variant resolved to (e.g. openai/gpt-4o), or model: "auto"
One thing to know about the slashed form: the prefix is the provider that serves the request, not the model’s author. TensorZero’s provider shorthand (e.g. gcp_vertex_anthropic) already encodes the serving provider, so you just rename the prefix. For most popular names author and provider match (openai, anthropic, mistral), but they diverge whenever a model is hosted by something other than its author:
AuthorProvider serving the requestConcentrate slug
Anthropic (claude-haiku-4-5)Anthropicanthropic/claude-haiku-4-5
Anthropic (same model, different host)AWS Bedrockbedrock/claude-haiku-4-5
Google (gemini-3.5-flash)Google AI Studioai-studio/gemini-3.5-flash
Meta (llama-3-8b-instruct)AWS Bedrockbedrock/llama-3-8b-instruct
Bare slugs work in all of these cases. Use them when you don’t care which provider serves the request. Use the provider/ prefix when you specifically want to pin to one host (for ZDR compliance, contractual reasons, or latency in a specific region). To replace a configured model with provider fallbacks, use routing.model.fallbacks / routing.provider.fallbacks or model: "auto" (see Step 4). For the authoritative list of supported provider/model-id pairs, call GET /v1/models or browse the Model Fortress.

Step 6: Reconnect Observability

TensorZero bundles observability, optimization, evaluations, and experimentation alongside the gateway (backed by ClickHouse). Concentrate’s dashboard covers the gateway-side surfaces; the experimentation and optimization tooling has no direct equivalent.
TensorZero surfaceConcentrate equivalent
Inference observability (per-request logs, cost, latency)Per-request logs at concentrate.ai
tensorzero_cost / usage trackingOrg / team / developer / key spend rollups
tags-based filteringPer-key / team / org rollups. No per-request tag dimension
Episodes (multi-step grouping)previous_response_id on the Responses API links related requests into a stateful tree
OTLP trace export (tensorzero-otlp-traces-extra-*)No OTLP passthrough. Concentrate records its own per-request telemetry; handle bespoke export in your tracing layer
Feedback API / evaluations / experimentsNo equivalent today. Keep your evaluation and experimentation tooling separate from the gateway
Rate limits (api_key_public_id scope)Per-key rate and spend limits in the dashboard

Exporting your TensorZero history

Because TensorZero is self-hosted, your history lives in your own ClickHouse database and does not import into Concentrate. If your migration is compliance- or audit-driven, snapshot that data before tearing down the gateway.

Step 7 (Optional): Adopt the Responses API

If you used TensorZero’s native /inference endpoint or its episodes, Concentrate’s native Responses API is the closest successor: streaming, tool calling, structured output, multi-modal input, and web search through one normalized shape across every provider, with previous_response_id replacing episode_id-grouped sessions.
curl https://api.concentrate.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CONCENTRATE_API_KEY" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "input": "What is the capital of France?"
  }'

Why migrate to Concentrate

TensorZero is a self-hosted gateway configured by tensorzero.toml, exposing several surfaces on one host. Migrating retires the process and config file and collapses every surface onto Concentrate’s single hosted base URL https://api.concentrate.ai/v1.
TensorZero surfaceTensorZero path (default host http://localhost:3000)
Native inference/inference
OpenAI-compatible/openai/v1/chat/completions
Batch inference/batch_inference
Feedback/feedback
The native /inference endpoint is Responses-style (typed input, multi-step episodes). Its closest one-to-one target is Concentrate’s Responses API, reachable from the standard OpenAI SDK or any HTTP client.
TensorZero is a process you deploy, scale, and keep alive, with tensorzero.toml and provider credentials to maintain. Concentrate is hosted, with no gateway to operate, no config to version, and credentials managed for you.
Concentrate organizes billing around an organization → team → developer → key hierarchy. Set budgets at any level and roll spend up into a single dashboard. Per-team budgets and per-developer attribution come from the key itself, so there’s no per-request tags tagging to maintain.
Beyond ordered model and provider fallbacks (routing.model.fallbacks, routing.provider.fallbacks), Concentrate’s routing layer ships:
  • Uptime gate. Providers whose per-feature success rate drops below 90% are skipped.
  • Feature degradation. If no provider supports the full requested feature set (e.g. json_schema), the request is downgraded to json_object or text instead of failing.
  • Cache-affinity routing. When multiple providers can serve a request, the one where your actor already has cached tokens is preferred.
All on by default. No variants or tensorzero.toml to author, name, or version.
model: "auto" accepts an explicit optimization target via routing.model.sort: cost, latency, or performance (default). See Auto Routing. This replaces hand-authoring a configured model with provider fallbacks in tensorzero.toml.
Like TensorZero’s native /inference endpoint, Concentrate exposes a first-class Responses API. Alongside it you also get OpenAI Chat Completions compatibility and an Anthropic-compatible Messages API.
Concentrate is BYOK-free. Point at a model and Concentrate owns the upstream credentials. The provider keys you wired into tensorzero.toml (and TensorZero’s per-request credentials param) can be retired after migrating, with no per-token BYOK fee.

Troubleshooting

Strip the tensorzero:: and model_name:: prefixes and convert the provider::model double-colon to provider/model (e.g. tensorzero::model_name::openai::gpt-4oopenai/gpt-4o). Bare slugs (gpt-4o, claude-haiku-4-5) work too. If you’re using a provider/ prefix and getting a miss, double-check the prefix is a provider (e.g. bedrock, azure, ai-studio) and not just the author (e.g. meta, google). Call GET /v1/models for the authoritative list.
Concentrate keys start with sk-cn-v1-. TensorZero’s OpenAI-compatible endpoint ignored the client key, so your code may have been sending a placeholder (or a sk-t0-... gateway key). Concentrate requires a real sk-cn-v1-... key on every request. Verify the value in your dashboard and confirm there are no extra spaces or quotes.
Confirm the base URL is https://api.concentrate.ai/v1, not localhost:3000. If the client is still pointed at your self-hosted gateway it is logging against your own ClickHouse store, not Concentrate.
tensorzero::function_name::... and tensorzero::variant_name are no-ops on Concentrate; there is no function/variant config. Pin a model with model: "provider/model-id", express fallbacks via routing.model.fallbacks / routing.provider.fallbacks, or use model: "auto". Move function prompt templates into application code. Variant A/B sampling and the /feedback loop have no per-request equivalent, so handle the split in application code.
Concentrate has no episode_id body field or /feedback endpoint. Link related requests with previous_response_id on the Responses API. Evaluation feedback that fed TensorZero’s optimization stack should stay in your own tooling.
Concentrate uses provider-native prompt caching, currently supported on Anthropic and AWS Bedrock. There is no gateway-stored cache and no max_age_s control, so TensorZero’s cache_options does not carry over directly. Caches are seeded per API key by default; pass prompt_cache_key in the request body if you want to set the seed explicitly.
Confirm the base URL is https://api.concentrate.ai/v1 (no /api segment, no /openai/v1 or /inference suffix, no localhost:3000). Test the connection manually:
curl https://api.concentrate.ai/v1/responses/health

Next Steps

API Reference

Explore the full API capabilities

Available Models

Browse all supported models

Auto Routing

Optimize model selection automatically

Get Support

Contact our support team

Feedback

If you hit anything that didn’t translate cleanly (especially around functions, variants, episodes, the feedback/experimentation loop, cache_options, or OTLP trace export), email support@concentrate.ai. The capability gaps called out above are tracked, and migration friction reports directly shape what we ship next.