TensorZero

OpenAI-compatible
Native client (/inference)

# Before
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3000/openai/v1",
    api_key="not-used",  # ignored: the gateway uses creds from tensorzero.toml
)

response = client.chat.completions.create(
    model="tensorzero::model_name::openai::gpt-4o",
    messages=[{"role": "user", "content": "Say hello in one word"}],
)

# After
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.concentrate.ai/v1",
    api_key=os.environ["CONCENTRATE_API_KEY"],
)

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Say hello in one word"}],
)

// Before
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:3000/openai/v1",
  apiKey: "not-used", // ignored: the gateway uses creds from tensorzero.toml
});

const response = await client.chat.completions.create({
  model: "tensorzero::model_name::openai::gpt-4o",
  messages: [{ role: "user", content: "Say hello in one word" }],
});

// After
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.concentrate.ai/v1",
  apiKey: process.env.CONCENTRATE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Say hello in one word" }],
});

# Before
curl http://localhost:3000/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{ "model": "tensorzero::model_name::openai::gpt-4o", "messages": [...] }'

# After
curl https://api.concentrate.ai/v1/chat/completions \
  -H "Authorization: Bearer $CONCENTRATE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "openai/gpt-4o", "messages": [...] }'

# Before
from tensorzero import TensorZeroGateway

with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    response = client.inference(
        model_name="openai::gpt-4o",
        input={"messages": [{"role": "user", "content": "Say hello in one word"}]},
        tags={"feature": "summarizer"},
    )

# After
# (Concentrate also exposes a native Responses API at /v1/responses; see the last step.)
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.concentrate.ai/v1",
    api_key=os.environ["CONCENTRATE_API_KEY"],
)

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Say hello in one word"}],
)

# Before
curl http://localhost:3000/inference \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "openai::gpt-4o",
    "input": { "messages": [{"role": "user", "content": "Say hello in one word"}] }
  }'

# After
curl https://api.concentrate.ai/v1/chat/completions \
  -H "Authorization: Bearer $CONCENTRATE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "openai/gpt-4o", "messages": [...] }'

Prerequisites

A Concentrate AI account with an active API key

An existing TensorZero integration

This guide assumes you are calling a self-hosted TensorZero gateway from the native tensorzero client (POST /inference), the OpenAI SDK pointed at /openai/v1, fetch, requests, or another HTTP client.

Quick Start for Claude Code users

If you use Claude Code, you can install a skill that walks through this migration interactively. It collapses the base URL, strips tensorzero:: model namespacing and body params, decomposes your tensorzero.toml functions and variants, maps model slugs, and generates a verification script. Drop the skill into your ~/.claude/skills/ directory:

mkdir -p ~/.claude/skills/migrate-tensorzero && \
  curl -fsSL https://concentrate.ai/scripts/migrate-tensorzero.md \
  -o ~/.claude/skills/migrate-tensorzero/SKILL.md

Then start a Claude Code session in your project and ask it to “migrate from TensorZero to Concentrate” or run /migrate-tensorzero. Claude will load the skill and run the steps.

Step 1: Update Your Environment Variables

TensorZero is self-hosted: provider credentials live in your tensorzero.toml, and gateway auth (if enabled) uses TENSORZERO_API_KEY. All of that collapses to a single Concentrate key:

# Before
export TENSORZERO_API_KEY="sk-t0-..."   # gateway auth, if enabled
export OPENAI_API_KEY="sk-..."          # provider creds read by tensorzero.toml
export ANTHROPIC_API_KEY="sk-ant-..."
export BASE_URL="http://localhost:3000/openai/v1"

# After
export CONCENTRATE_API_KEY="sk-cn-v1-..."
export BASE_URL="https://api.concentrate.ai/v1"

Concentrate is hosted with managed credentials, so there’s no gateway process or tensorzero.toml to maintain, and no OPENAI_API_KEY, ANTHROPIC_API_KEY, or AWS credentials to keep. Comment them out (don’t delete) until you’ve verified the migration end-to-end, then remove them.

TensorZero’s OpenAI-compatible endpoint ignores any api_key sent by the client; it uses credentials from its own config. Concentrate requires a real sk-cn-v1-... key on every request, so make sure your client now sends one (not a placeholder).

Step 2: Update Your Client

Collapse the gateway base URL (both the /openai/v1 and /inference paths) onto https://api.concentrate.ai/v1, and drop the tensorzero:: namespacing and body params. If you used the native tensorzero client, swap to the OpenAI SDK. Concentrate ships no dedicated SDK because the OpenAI-compatible shape covers every endpoint.

# Before
from tensorzero import TensorZeroGateway

with TensorZeroGateway.build_http(gateway_url="http://localhost:3000") as client:
    response = client.inference(
        model_name="openai::gpt-4o",
        input={"messages": [{"role": "user", "content": "Say hello in one word"}]},
    )

# After
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.concentrate.ai/v1",
    api_key=os.environ["CONCENTRATE_API_KEY"],
)

response = client.chat.completions.create(
    model="anthropic/claude-opus-4-6",
    messages=[{"role": "user", "content": "Say hello in one word"}],
)
print(response.choices[0].message.content)

// Before
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:3000/openai/v1",
  apiKey: "not-used",
});

// After
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.concentrate.ai/v1",
  apiKey: process.env.CONCENTRATE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-opus-4-6",
  messages: [{ role: "user", content: "Say hello in one word" }],
});
console.log(response.choices[0].message.content);

curl https://api.concentrate.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CONCENTRATE_API_KEY" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "messages": [{"role": "user", "content": "Say hello in one word"}]
  }'

Step 3: Remove TensorZero-Specific Headers and Body Params

TensorZero keeps almost all of its distinctiveness in tensorzero::-prefixed body params, not headers. None of them, nor TensorZero’s custom headers, carry over to Concentrate, so they should come out; they’re dead weight and mislead future readers. Expand the tables below for the mapping.

Request header mapping

TensorZero header	Concentrate replacement
`Authorization: Bearer sk-t0-...`	Standard `Authorization: Bearer sk-cn-v1-...` (ignored on TensorZero’s OpenAI endpoint; required on Concentrate)
`tensorzero-otlp-traces-extra-header-` / `-attribute-` / `-resource-*` (inject headers, span, and resource attributes into OTLP trace exports)	No equivalent. Concentrate records its own per-request telemetry; handle bespoke OTLP export in your tracing layer

Request body param mapping (`tensorzero::*`)

Sent as tensorzero::-prefixed body fields on the OpenAI endpoint (via extra_body), or as top-level fields (episode_id, variant_name, …) on /inference.

TensorZero body param	Concentrate replacement
`tensorzero::episode_id`	Drop. Link calls with `previous_response_id` (see Step 4)
`tensorzero::variant_name` (pin a variant)	Drop. Pin `model: "provider/model-id"`; express A/B and fallback as `routing.*` body params (see Step 4)
`tensorzero::cache_options` (`enabled`, `max_age_s`)	Provider-native prompt caching (Anthropic + AWS Bedrock). Seeded per key by default; set the seed with the `prompt_cache_key` body param. No gateway-stored cache and no `max_age_s` control
`tensorzero::credentials` (dynamic per-request provider keys)	Dashboard-level BYOK, not a request param. Store the key once and routing uses it automatically
`tensorzero::tags` (key/value metadata)	Drop. Attribution comes from the key/team/org hierarchy; analytics roll up automatically. No per-request tag dimension
`tensorzero::namespace` (experimentation config selector)	Drop. No per-request experimentation namespace
`tensorzero::params` (override inference parameters)	Pass standard parameters (`temperature`, `max_tokens`, etc.) directly in the request body
`tensorzero::extra_body` (JSON-Pointer edits to the provider request)	Pass the underlying provider fields directly in the request body
`tensorzero::extra_headers` (inject provider request headers)	Not applicable. Concentrate manages the upstream request
`tensorzero::provider_tools` (provider built-in tools)	Use the standard `tools` field; web search is normalized across providers on the Responses API
`tensorzero::dryrun` (run without storing)	No equivalent. Disable request logging at the key level via Zero Data Retention
`tensorzero::include_raw_response`, `tensorzero::include_raw_usage`	Drop. Concentrate returns a normalized response; per-request detail is in the dashboard
`tensorzero::deny_unknown_fields`	Drop. No equivalent toggle

Response field mapping

TensorZero response field	Concentrate equivalent
`episode_id`	No body field. Use `previous_response_id` / the response `id` on the Responses API to link related requests
`tensorzero_cost`	No body field. Per-request cost is recorded in the dashboard
`tensorzero_raw_response`, `tensorzero_raw_usage`, `tensorzero_extra_content`	No body fields. Concentrate returns a normalized OpenAI-shaped response; reasoning and usage detail surface through the standard schema and the dashboard
Request / trace id	`X-Request-Id` (Concentrate returns its own per-request id, surfaced in dashboard logs)
Rate-limit headers	Standard `X-RateLimit-*` headers. See Errors for 429 semantics

Step 4: Decompose Functions, Variants, and Episodes

TensorZero’s defining concept is the function / variant / episode model in tensorzero.toml: you call a named function, the gateway samples a variant (built-in A/B testing and fallbacks), and related inferences group into an episode. Concentrate has none of these primitives. Each behavior is either on by default or a body param. Expand the table for the mapping.

Function / variant / episode mapping

TensorZero concept	Concentrate equivalent
Function (`tensorzero::function_name::my_fn`, named entry in config)	Call the model directly. Move the function’s prompt template into your application code or system prompt
Variant (a concrete model + params behind a function)	Pin a model with `model: "provider/model-id"`, and pass params in the request body
Variant sampling for A/B testing	No built-in experiment splitter. Choose the variant in application code, or issue separate keys per arm for clean attribution
Variant fallbacks (a configured model with fallbacks, `tensorzero::model_name`)	`routing.model.fallbacks` / `routing.provider.fallbacks` (ordered) body params, or `model: "auto"`
Episode (`episode_id` grouping multi-step workflows)	`previous_response_id` on the Responses API links related requests into a server-managed conversation tree
Feedback API (`/feedback`, metrics attributed to an episode)	No equivalent today
`cache_options`	Provider-native prompt caching; `prompt_cache_key` sets the seed
Provider fallbacks inside a configured model	`routing.model.fallbacks` / `routing.provider.fallbacks` (ordered). Failover is automatic on any provider error

Step 5: Update Model Identifiers

Concentrate accepts model strings in two forms:

Bare slug, e.g. gpt-4o, claude-haiku-4-5, auto. Routing picks a provider.
provider/model-id, e.g. bedrock/claude-haiku-4-5, openai/gpt-4o. Pins the request to a specific provider.

TensorZero wraps model strings in tensorzero:: namespacing. Strip the tensorzero:: and model_name:: prefixes, and convert the provider::model double-colon to a provider/model slash. A function_name reference has no model string, so use the model its winning variant resolved to (see Step 4).

TensorZero	Concentrate AI
`tensorzero::model_name::openai::gpt-4o`	`openai/gpt-4o`
`tensorzero::model_name::anthropic::claude-...`	`anthropic/claude-...`
`openai::gpt-4o` (native `model_name`)	`openai/gpt-4o`
`tensorzero::function_name::my_fn`	The model its variant resolved to (e.g. `openai/gpt-4o`), or `model: "auto"`

One thing to know about the slashed form: the prefix is the provider that serves the request, not the model’s author. TensorZero’s provider shorthand (e.g. gcp_vertex_anthropic) already encodes the serving provider, so you just rename the prefix. For most popular names author and provider match (openai, anthropic, mistral), but they diverge whenever a model is hosted by something other than its author:

Author	Provider serving the request	Concentrate slug
Anthropic (`claude-haiku-4-5`)	Anthropic	`anthropic/claude-haiku-4-5`
Anthropic (same model, different host)	AWS Bedrock	`bedrock/claude-haiku-4-5`
Google (`gemini-3.5-flash`)	Google AI Studio	`ai-studio/gemini-3.5-flash`
Meta (`llama-3-8b-instruct`)	AWS Bedrock	`bedrock/llama-3-8b-instruct`

Bare slugs work in all of these cases. Use them when you don’t care which provider serves the request. Use the provider/ prefix when you specifically want to pin to one host (for ZDR compliance, contractual reasons, or latency in a specific region). To replace a configured model with provider fallbacks, use routing.model.fallbacks / routing.provider.fallbacks or model: "auto" (see Step 4). For the authoritative list of supported provider/model-id pairs, call GET /v1/models or browse the Model Fortress.

Step 6: Reconnect Observability

TensorZero bundles observability, optimization, evaluations, and experimentation alongside the gateway (backed by ClickHouse). Concentrate’s dashboard covers the gateway-side surfaces; the experimentation and optimization tooling has no direct equivalent.

TensorZero surface	Concentrate equivalent
Inference observability (per-request logs, cost, latency)	Per-request logs at concentrate.ai
`tensorzero_cost` / usage tracking	Org / team / developer / key spend rollups
`tags`-based filtering	Per-key / team / org rollups. No per-request tag dimension
Episodes (multi-step grouping)	`previous_response_id` on the Responses API links related requests into a stateful tree
OTLP trace export (`tensorzero-otlp-traces-extra-*`)	No OTLP passthrough. Concentrate records its own per-request telemetry; handle bespoke export in your tracing layer
Feedback API / evaluations / experiments	No equivalent today. Keep your evaluation and experimentation tooling separate from the gateway
Rate limits (`api_key_public_id` scope)	Per-key rate and spend limits in the dashboard

Exporting your TensorZero history

Because TensorZero is self-hosted, your history lives in your own ClickHouse database and does not import into Concentrate. If your migration is compliance- or audit-driven, snapshot that data before tearing down the gateway.

Step 7 (Optional): Adopt the Responses API

If you used TensorZero’s native /inference endpoint or its episodes, Concentrate’s native Responses API is the closest successor: streaming, tool calling, structured output, multi-modal input, and web search through one normalized shape across every provider, with previous_response_id replacing episode_id-grouped sessions.

curl https://api.concentrate.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CONCENTRATE_API_KEY" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "input": "What is the capital of France?"
  }'

import os
import requests

response = requests.post(
    "https://api.concentrate.ai/v1/responses",
    headers={
        "Authorization": f"Bearer {os.environ['CONCENTRATE_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "anthropic/claude-opus-4-6",
        "input": "What is the capital of France?",
    },
)
print(response.json())

const response = await fetch("https://api.concentrate.ai/v1/responses", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.CONCENTRATE_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "anthropic/claude-opus-4-6",
    input: "What is the capital of France?",
  }),
});
console.log(await response.json());

Why migrate to Concentrate

How it works

TensorZero is a self-hosted gateway configured by tensorzero.toml, exposing several surfaces on one host. Migrating retires the process and config file and collapses every surface onto Concentrate’s single hosted base URL https://api.concentrate.ai/v1.

TensorZero surface	TensorZero path (default host `http://localhost:3000`)
Native inference	`/inference`
OpenAI-compatible	`/openai/v1/chat/completions`
Batch inference	`/batch_inference`
Feedback	`/feedback`

The native /inference endpoint is Responses-style (typed input, multi-step episodes). Its closest one-to-one target is Concentrate’s Responses API, reachable from the standard OpenAI SDK or any HTTP client.

No gateway to run

TensorZero is a process you deploy, scale, and keep alive, with tensorzero.toml and provider credentials to maintain. Concentrate is hosted, with no gateway to operate, no config to version, and credentials managed for you.

Team-scale spend management

Concentrate organizes billing around an organization → team → developer → key hierarchy. Set budgets at any level and roll spend up into a single dashboard. Per-team budgets and per-developer attribution come from the key itself, so there’s no per-request tags tagging to maintain.

Feature-aware resiliency

Beyond ordered model and provider fallbacks (routing.model.fallbacks, routing.provider.fallbacks), Concentrate’s routing layer ships:

Uptime gate. Providers whose per-feature success rate drops below 90% are skipped.
Feature degradation. If no provider supports the full requested feature set (e.g. json_schema), the request is downgraded to json_object or text instead of failing.
Cache-affinity routing. When multiple providers can serve a request, the one where your actor already has cached tokens is preferred.

All on by default. No variants or tensorzero.toml to author, name, or version.

Strategy-driven auto routing

model: "auto" accepts an explicit optimization target via routing.model.sort: cost, latency, or performance (default). See Auto Routing. This replaces hand-authoring a configured model with provider fallbacks in tensorzero.toml.

Native Responses and Messages APIs

Like TensorZero’s native /inference endpoint, Concentrate exposes a first-class Responses API. Alongside it you also get OpenAI Chat Completions compatibility and an Anthropic-compatible Messages API.

Managed provider credentials by default

Concentrate manages provider credentials by default — point at a model and Concentrate owns the upstream credentials. And if you want to keep using the provider keys you wired into tensorzero.toml (or TensorZero’s per-request credentials param), store them once in the dashboard with free BYOK — no gateway config to maintain.

Troubleshooting

Model not found

Strip the tensorzero:: and model_name:: prefixes and convert the provider::model double-colon to provider/model (e.g. tensorzero::model_name::openai::gpt-4o → openai/gpt-4o). Bare slugs (gpt-4o, claude-haiku-4-5) work too. If you’re using a provider/ prefix and getting a miss, double-check the prefix is a provider (e.g. bedrock, azure, ai-studio) and not just the author (e.g. meta, google). Call GET /v1/models for the authoritative list.

Invalid API key error

Concentrate keys start with sk-cn-v1-. TensorZero’s OpenAI-compatible endpoint ignored the client key, so your code may have been sending a placeholder (or a sk-t0-... gateway key). Concentrate requires a real sk-cn-v1-... key on every request. Verify the value in your dashboard and confirm there are no extra spaces or quotes.

Requests succeed but nothing shows up in the Concentrate dashboard

Confirm the base URL is https://api.concentrate.ai/v1, not localhost:3000. If the client is still pointed at your self-hosted gateway it is logging against your own ClickHouse store, not Concentrate.

My functions / variants stopped working

tensorzero::function_name::... and tensorzero::variant_name are no-ops on Concentrate; there is no function/variant config. Pin a model with model: "provider/model-id", express fallbacks via routing.model.fallbacks / routing.provider.fallbacks, or use model: "auto". Move function prompt templates into application code. Variant A/B sampling and the /feedback loop have no per-request equivalent, so handle the split in application code.

My episodes / feedback stopped working

Concentrate has no episode_id body field or /feedback endpoint. Link related requests with previous_response_id on the Responses API. Evaluation feedback that fed TensorZero’s optimization stack should stay in your own tooling.

Cache hit rate dropped after migrating

Concentrate uses provider-native prompt caching, currently supported on Anthropic and AWS Bedrock. There is no gateway-stored cache and no max_age_s control, so TensorZero’s cache_options does not carry over directly. Caches are seeded per API key by default; pass prompt_cache_key in the request body if you want to set the seed explicitly.

Connection errors

Confirm the base URL is https://api.concentrate.ai/v1 (no /api segment, no /openai/v1 or /inference suffix, no localhost:3000). Test the connection manually:

curl https://api.concentrate.ai/v1/responses/health

Next Steps

API Reference

Explore the full API capabilities

Available Models

Browse all supported models

Auto Routing

Optimize model selection automatically

Get Support

Contact our support team

Feedback

If you hit anything that didn’t translate cleanly (especially around functions, variants, episodes, the feedback/experimentation loop, cache_options, or OTLP trace export), email support@concentrate.ai. The capability gaps called out above are tracked, and migration friction reports directly shape what we ship next.

Getting started

Migrations

Alerts

Prerequisites

Quick Start for Claude Code users

Step 1: Update Your Environment Variables

Step 2: Update Your Client

Step 3: Remove TensorZero-Specific Headers and Body Params

Step 4: Decompose Functions, Variants, and Episodes

Step 5: Update Model Identifiers

Step 6: Reconnect Observability

Exporting your TensorZero history

Step 7 (Optional): Adopt the Responses API

Why migrate to Concentrate

Troubleshooting

Next Steps

API Reference

Available Models

Auto Routing

Get Support

Feedback

​Prerequisites

​Quick Start for Claude Code users

​Step 1: Update Your Environment Variables

​Step 2: Update Your Client

​Step 3: Remove TensorZero-Specific Headers and Body Params

​Step 4: Decompose Functions, Variants, and Episodes

​Step 5: Update Model Identifiers

​Step 6: Reconnect Observability

​Exporting your TensorZero history

​Step 7 (Optional): Adopt the Responses API

​Why migrate to Concentrate

​Troubleshooting

​Next Steps

API Reference

Available Models

Auto Routing

Get Support

​Feedback

Prerequisites

Quick Start for Claude Code users

Step 1: Update Your Environment Variables

Step 2: Update Your Client

Step 3: Remove TensorZero-Specific Headers and Body Params

Step 4: Decompose Functions, Variants, and Episodes

Step 5: Update Model Identifiers

Step 6: Reconnect Observability

Exporting your TensorZero history

Step 7 (Optional): Adopt the Responses API

Why migrate to Concentrate

Troubleshooting

Next Steps

Feedback