LiteLLM

OpenAI SDK
Anthropic SDK

# Before: OpenAI SDK pointed at your LiteLLM Proxy
from openai import OpenAI
import os

client = OpenAI(
    base_url="http://localhost:4000",
    api_key=os.environ["LITELLM_API_KEY"],  # sk-... virtual key
)

# After
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.concentrate.ai/v1",
    api_key=os.environ["CONCENTRATE_API_KEY"],
)

// Before: OpenAI SDK pointed at your LiteLLM Proxy
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4000",
  apiKey: process.env.LITELLM_API_KEY,
});

// After
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.concentrate.ai/v1",
  apiKey: process.env.CONCENTRATE_API_KEY,
});

# Before
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "my-chat-model", "messages": [...] }'

# After
curl https://api.concentrate.ai/v1/chat/completions \
  -H "Authorization: Bearer $CONCENTRATE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "anthropic/claude-opus-4-6", "messages": [...] }'

# Before: Anthropic SDK pointed at LiteLLM's /v1/messages surface
from anthropic import Anthropic
import os

client = Anthropic(
    base_url="http://localhost:4000",
    api_key=os.environ["LITELLM_API_KEY"],  # sent as x-api-key
)

# After: OpenAI SDK pointed at Concentrate
# (Concentrate also exposes an Anthropic-compatible Messages API at /v1/messages.)
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.concentrate.ai/v1",
    api_key=os.environ["CONCENTRATE_API_KEY"],
)

// Before: Anthropic SDK pointed at LiteLLM's /v1/messages surface
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "http://localhost:4000",
  apiKey: process.env.LITELLM_API_KEY,
});

// After: OpenAI SDK pointed at Concentrate
// (Concentrate also exposes an Anthropic-compatible Messages API at /v1/messages.)
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.concentrate.ai/v1",
  apiKey: process.env.CONCENTRATE_API_KEY,
});

# Before
curl http://localhost:4000/v1/messages \
  -H "x-api-key: $LITELLM_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{ "model": "my-chat-model", "messages": [...] }'

# After (Anthropic-compatible Messages API)
curl https://api.concentrate.ai/v1/messages \
  -H "Authorization: Bearer $CONCENTRATE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "anthropic/claude-opus-4-6", "messages": [...] }'

Prerequisites

A Concentrate AI account with an active API key

An existing LiteLLM integration

This guide assumes you are calling a LiteLLM Proxy (the self-hostable AI Gateway) from the OpenAI SDK, the Anthropic SDK, fetch, requests, or another HTTP client. If you import the LiteLLM Python SDK directly (litellm.completion(...)), see the SDK note below.

Quick Start for Claude Code users

If you use Claude Code, you can install a skill that walks through this migration interactively. It searches your project for LiteLLM usage, strips x-litellm-* headers, decomposes config.yaml model groups and routing, maps model slugs, and generates a verification script. Drop the skill into your ~/.claude/skills/ directory:

mkdir -p ~/.claude/skills/migrate-litellm && \
  curl -fsSL https://concentrate.ai/scripts/migrate-litellm.md \
  -o ~/.claude/skills/migrate-litellm/SKILL.md

Then start a Claude Code session in your project and ask it to “migrate from LiteLLM to Concentrate” or run /migrate-litellm. Claude will load the skill and run the steps.

Step 1: Update Your Environment Variables

Replace your LiteLLM virtual key (and any upstream provider keys your proxy held) with a single Concentrate key:

# Before
export LITELLM_API_KEY="sk-..."
export BASE_URL="http://localhost:4000"

# After
export CONCENTRATE_API_KEY="sk-cn-v1-..."
export BASE_URL="https://api.concentrate.ai/v1"

Concentrate is fully managed, so the proxy and everything behind it can be retired: LITELLM_MASTER_KEY, the DATABASE_URL (Postgres) for virtual keys and spend, and every upstream provider key (OPENAI_API_KEY, ANTHROPIC_API_KEY, AWS credentials, etc.). Comment them out (don’t delete) until the migration is verified end-to-end, then remove them.

Step 2: Update Your Client

Point your existing OpenAI or Anthropic SDK at Concentrate’s base URL and strip every x-litellm-* header. There’s no dedicated SDK, since the OpenAI-compatible shape covers every endpoint.

Watch the base-URL path. The LiteLLM Proxy accepts the OpenAI surface with or without the /v1 prefix, so your client may have no /v1 in its base URL. Concentrate always requires it.

# Before (OpenAI SDK + LiteLLM Proxy)
from openai import OpenAI
import os

client = OpenAI(
    base_url="http://localhost:4000",
    api_key=os.environ["LITELLM_API_KEY"],
    default_headers={
        "x-litellm-tags": "team-a,prod",
        "x-litellm-spend-logs-metadata": '{"feature": "summarizer"}',
    },
)

# After (Concentrate)
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.concentrate.ai/v1",
    api_key=os.environ["CONCENTRATE_API_KEY"],
)

response = client.chat.completions.create(
    model="anthropic/claude-opus-4-6",
    messages=[{"role": "user", "content": "Say hello in one word"}],
)
print(response.choices[0].message.content)

// Before (OpenAI SDK + LiteLLM Proxy)
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4000",
  apiKey: process.env.LITELLM_API_KEY,
  defaultHeaders: {
    "x-litellm-tags": "team-a,prod",
    "x-litellm-spend-logs-metadata": JSON.stringify({ feature: "summarizer" }),
  },
});

// After (Concentrate)
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.concentrate.ai/v1",
  apiKey: process.env.CONCENTRATE_API_KEY,
});

const response = await client.chat.completions.create({
  model: "anthropic/claude-opus-4-6",
  messages: [{ role: "user", content: "Say hello in one word" }],
});
console.log(response.choices[0].message.content);

curl https://api.concentrate.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CONCENTRATE_API_KEY" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "messages": [{"role": "user", "content": "Say hello in one word"}]
  }'

If you authenticated with a custom key header (LiteLLM’s litellm_key_header_name, e.g. X-Litellm-Key: Bearer sk-...), drop it and use the standard Authorization: Bearer sk-cn-v1-.... The Anthropic /v1/messages surface’s native x-api-key auth switches to Authorization: Bearer too.

Step 3: Remove `x-litellm-*` Headers

None of LiteLLM’s custom headers carry over to Concentrate, so they should come out. They’re dead weight and mislead future readers. Expand the tables below if any are in your code.

Request header mapping

LiteLLM request header	Concentrate replacement
`Authorization: Bearer sk-...`	Standard `Authorization: Bearer sk-cn-v1-...`
`X-Litellm-Key` (custom key header)	Drop. Use standard `Authorization: Bearer`
`x-litellm-timeout`, `x-litellm-stream-timeout`	Set via your HTTP client’s standard timeout (e.g. OpenAI SDK `timeout`)
`x-litellm-num-retries`	Drop. Concentrate fails over automatically across your fallback chain; retries aren’t a per-request header
`x-litellm-tags`	Drop. Attribution comes from the key/team/org hierarchy, not per-request tags. See Step 4
`x-litellm-spend-logs-metadata`	Drop. Use per-user / team / org keys; analytics roll up by key automatically
`x-litellm-enable-message-redaction`	Zero Data Retention is a key-level setting (disable request logging), not a per-request header
`x-litellm-customer-id`, `x-litellm-end-user-id`	Drop. Per-user attribution comes from issuing a key per user/customer
`anthropic-version`, `anthropic-beta`	Handled by Concentrate. On the Messages API the version is managed for you
`openai-organization` (forwarded via `forward_openai_org_id`)	Drop. Concentrate owns the upstream provider account

Response header mapping

LiteLLM echoes a family of x-litellm-* headers back on every response. If your code reads or logs them, here’s the mapping. Concentrate does not emit x-litellm-* response headers.

LiteLLM response header	Concentrate equivalent
`x-litellm-call-id`	`X-Request-Id` (Concentrate returns its own per-request id, surfaced in dashboard logs)
`x-litellm-response-cost`, `x-litellm-key-spend`	No header. Cost and spend are recorded per request and rolled up in the dashboard
`x-litellm-model-id`, `x-litellm-model-group`, `x-litellm-model-api-base`	No header. The resolved model and provider are recorded per request in the dashboard
`x-litellm-attempted-retries`, `x-litellm-attempted-fallbacks`, `x-litellm-max-fallbacks`	No header. Failover is automatic; the resolved provider is recorded per request
`x-litellm-response-duration-ms`, `x-litellm-overhead-duration-ms`	No header. Latency is recorded per request in the dashboard
`x-litellm-version`	Not applicable. Concentrate is a managed service
`x-ratelimit-*`	Standard `X-RateLimit-*` headers. See Errors for 429 semantics
`llm_provider-*` (passed-through provider headers)	Not surfaced. Provider-side metadata is recorded per request in the dashboard

Step 4: Decompose Your `config.yaml` and Routing

LiteLLM centralizes models, routing, retries, and budgets in config.yaml and the proxy’s Postgres database. Concentrate has no config file. Each behavior is either on by default or expressed as a body param.

config.yaml mapping

LiteLLM `config.yaml` concept	Concentrate equivalent
`model_list[].model_name` (public alias) → `litellm_params.model` (upstream)	Call the model slug directly (`provider/model-id` or bare). No alias indirection. See Step 5
`router_settings.routing_strategy: latency-based-routing`	`model: "auto"` with `routing.model.sort: "latency"`
`router_settings.routing_strategy: cost-based-routing`	`model: "auto"` with `routing.model.sort: "cost"`
`router_settings.routing_strategy: simple-shuffle` / `least-busy` / `usage-based-routing-v2`	`model: "auto"` (default `routing.model.sort: "performance"`)
`fallbacks` / `context_window_fallbacks` (ordered)	`routing.model.fallbacks` / `routing.provider.fallbacks` (ordered) body params
`num_retries`, `retry_policy`, `cooldown_time`	Automatic failover on any provider error across your fallback chain, plus a 90% per-feature uptime gate
`timeout` / `stream_timeout`	Set via your HTTP client
`max_budget` / `budget_duration` (per key / user / team)	Per-key spend limits in the dashboard, across the org → team → developer → key hierarchy
`rpm` / `tpm` / `max_parallel_requests`	Per-key rate limits in the dashboard
`litellm_params.api_key` / `api_base` (upstream credentials)	Drop. Concentrate owns upstream credentials
`general_settings.master_key`	Not applicable. No proxy to administer

LiteLLM’s tag-based routing (x-litellm-tags steering a request to a tagged deployment) has no direct equivalent. Concentrate’s conditional routing conditions on request capabilities (feature support, ZDR flag, per-feature uptime, cache affinity), not on request attributes like tags or customer IDs. If you route by tag, the migration path is to issue separate keys per condition and pick the key in application code.

Step 5: Update Model Identifiers

Concentrate accepts model strings in two forms:

Bare slug, e.g. gpt-4o, claude-haiku-4-5, auto. Routing picks a provider.
provider/model-id, e.g. bedrock/claude-haiku-4-5, openai/gpt-4o. Pins the request to a specific provider.

LiteLLM clients call a model_name alias (e.g. my-chat-model) that the proxy maps to an upstream litellm_params.model (e.g. bedrock/anthropic.claude-instant-v1). Concentrate has no alias layer, so replace each alias with the slug it resolved to. That upstream model is already provider-prefixed (bedrock/, azure/, vertex_ai/, openai/), so read the intended provider off it. One thing to know about the slashed form: the prefix is the provider that serves the request, not the model’s author. For most popular names the two are the same string (openai, anthropic, mistral, cohere). They diverge whenever a model is hosted by something other than its author:

Author	Provider serving the request	Concentrate slug
Anthropic (`claude-haiku-4-5`)	Anthropic	`anthropic/claude-haiku-4-5`
Anthropic (same model, different host)	AWS Bedrock	`bedrock/claude-haiku-4-5`
Anthropic (same model, different host)	Azure	`azure/claude-haiku-4-5`
Google (`gemini-3.5-flash`)	Google AI Studio	`ai-studio/gemini-3.5-flash`
Meta (`llama-3-8b-instruct`)	AWS Bedrock	`bedrock/llama-3-8b-instruct`

Bare slugs work in all of these cases. Use them when you don’t care which provider serves the request. Use the provider/ prefix when you specifically want to pin to one host (for ZDR compliance, contractual reasons, or latency in a specific region). LiteLLM’s vertex_ai/ prefix maps to Concentrate’s ai-studio/ (or the appropriate Vertex-backed slug). The only universally required slug change is auto-routing:

LiteLLM	Concentrate AI
`routing_strategy` in `router_settings`	`model: "auto"` with explicit `routing.model.sort` (`cost` / `latency` / `performance`)

For the authoritative list of supported provider/model-id pairs, call GET /v1/models or browse the Model Fortress.

Step 6: Reconnect Observability

LiteLLM’s spend logs and dashboard live in the proxy’s Postgres database (or whichever logging callback you wired up). Concentrate’s dashboard covers the major surfaces without anything to self-host.

LiteLLM surface	Concentrate equivalent
Spend logs (per request/response, in Postgres)	Per-request logs at concentrate.ai
`x-litellm-spend-logs-metadata` breakdowns	Per-key / team / org rollups. No per-request metadata dimension
Tag-based spend (`x-litellm-tags`)	Per-key / team / org rollups. No per-request tag dimension
Customer / end-user spend (`x-litellm-customer-id`)	Issue a key per customer; spend rolls up by key
Per-key / user / team budgets	Per-key and per-team spend limits in the dashboard
`success_callback` / `failure_callback` (Langfuse, Datadog, etc.)	Built-in dashboards; no external logging integration to wire up
Virtual key management (`/key/generate`)	Keys created in the dashboard across the org / team / developer hierarchy

Exporting your LiteLLM history

Spend logs don’t transfer in either direction, so they stay where they were created. If your migration is compliance-driven, export them from your proxy’s Postgres database (or your logging callback’s destination) before deprovisioning the proxy.

Step 7 (Optional): Adopt the Responses API

For new code, we recommend the native Responses API. It supports streaming, tool calling, structured output, multi-modal input, and web search through a single normalized shape across every provider, and previous_response_id links related requests into a server-managed conversation tree.

curl https://api.concentrate.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CONCENTRATE_API_KEY" \
  -d '{
    "model": "anthropic/claude-opus-4-6",
    "input": "What is the capital of France?"
  }'

import os
import requests

response = requests.post(
    "https://api.concentrate.ai/v1/responses",
    headers={
        "Authorization": f"Bearer {os.environ['CONCENTRATE_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "anthropic/claude-opus-4-6",
        "input": "What is the capital of France?",
    },
)
print(response.json())

const response = await fetch("https://api.concentrate.ai/v1/responses", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.CONCENTRATE_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "anthropic/claude-opus-4-6",
    input: "What is the capital of France?",
  }),
});
console.log(await response.json());

Why migrate to Concentrate

How it works

LiteLLM ships as two things, and only the proxy migrates:

LiteLLM Proxy (AI Gateway). A self-hostable, OpenAI-compatible proxy backed by Postgres for virtual keys and spend, exposing an OpenAI surface at /v1/chat/completions, an Anthropic surface at /v1/messages, plus embeddings, images, audio, and passthrough routes. Point your existing client at https://api.concentrate.ai/v1 and swap the key.
LiteLLM Python SDK. The litellm.completion() library you import directly. See the SDK note below.

Migrating off the proxy means no more server, Postgres database, master key, or config.yaml to operate.

Team-scale spend management

Concentrate organizes billing around an organization → team → developer → key hierarchy. Set budgets at any level and roll spend up into a single dashboard. Attribution comes from the key itself, with no per-request tags or metadata to maintain, and no Postgres to run.

Feature-aware resiliency

Beyond ordered model and provider fallbacks (routing.model.fallbacks, routing.provider.fallbacks), Concentrate’s routing layer ships:

Uptime gate. Providers whose per-feature success rate drops below 90% are skipped.
Feature degradation. If no provider supports the full requested feature set (e.g. json_schema), the request is downgraded to json_object or text instead of failing.
Cache-affinity routing. When multiple providers can serve a request, the one where your actor already has cached tokens is preferred.

All on by default. No config.yaml to author or version, and no per-request x-litellm-num-retries to set.

Strategy-driven auto routing

model: "auto" accepts an explicit optimization target via routing.model.sort: cost, latency, or performance (default). See Auto Routing. This maps cleanly onto LiteLLM’s cost-based-routing and latency-based-routing strategies.

Native Responses and Messages APIs

Alongside OpenAI Chat Completions, Concentrate exposes a first-class Responses API and an Anthropic-compatible Messages API, the same /v1/messages shape you may have used on the proxy, now fully managed.

Managed provider credentials by default

Concentrate manages provider credentials by default — point at a model and Concentrate owns the upstream credentials. If you’d rather keep using your own provider accounts, store those keys once in the dashboard with BYOK — completely free — and routing uses them automatically.

What about the LiteLLM Python SDK?

If your code imports the LiteLLM Python SDK directly (import litellm; litellm.completion(...)) instead of calling a proxy, there’s no base URL to swap. Replace litellm.completion(...) with the OpenAI SDK pointed at Concentrate (client.chat.completions.create(...)), since both speak the OpenAI Chat Completions shape. The Step 5 slug guidance still applies.

Troubleshooting

Model not found

Bare slugs (gpt-4o, claude-haiku-4-5) and provider/model-id slugs both work. LiteLLM model_name aliases (e.g. my-chat-model) do not. Replace each alias with the underlying Concentrate slug. If you’re using a provider/ prefix and getting a miss, double-check the prefix is a provider (e.g. bedrock, azure, ai-studio) and not just the author (e.g. meta, google). Call GET /v1/models for the authoritative list.

Invalid API key error

Concentrate keys start with sk-cn-v1-. If you are still sending a LiteLLM sk-... virtual key (or your LITELLM_MASTER_KEY) as the Authorization bearer, you will see a 401. Verify the value in your dashboard and confirm there are no extra spaces or quotes. If you authenticated via a custom X-Litellm-Key header, switch to standard Authorization: Bearer.

Requests succeed but nothing shows up in the Concentrate dashboard

Confirm the base URL is https://api.concentrate.ai/v1, not your LiteLLM proxy host (e.g. localhost:4000). If the SDK is still pointed at the proxy it is logging against your proxy’s Postgres spend logs, not Concentrate.

My config.yaml routing stopped applying

config.yaml and router_settings are not read by Concentrate. Express fallbacks via routing.model.fallbacks / routing.provider.fallbacks body params, or use model: "auto" with a routing strategy. Tag-based routing has no direct equivalent, so handle it with per-condition keys in application code.

My spend tags / customer IDs stopped working

Concentrate does not accept per-request x-litellm-tags, x-litellm-spend-logs-metadata, or x-litellm-customer-id. Attribution comes from the key/team/org hierarchy: issue a separate key per customer, team, or environment and analytics roll up automatically. Custom per-request metadata has no equivalent today.

Connection errors

Confirm the base URL is https://api.concentrate.ai/v1 (note the required /v1 segment, no /api segment, and no proxy host). LiteLLM tolerated a base URL without /v1; Concentrate does not. Test the connection manually:

curl https://api.concentrate.ai/v1/responses/health

Next Steps

API Reference

Explore the full API capabilities

Available Models

Browse all supported models

Auto Routing

Optimize model selection automatically

Get Support

Contact our support team

Feedback

If you hit anything that didn’t translate cleanly (especially around config.yaml model groups, router strategies, tag-based routing, spend-logs metadata, or self-hosting), email support@concentrate.ai. The capability gaps called out above are tracked, and migration friction reports directly shape what we ship next.

Getting started

Migrations

Alerts

Prerequisites

Quick Start for Claude Code users

Step 1: Update Your Environment Variables

Step 2: Update Your Client

Step 3: Remove `x-litellm-*` Headers

Step 4: Decompose Your `config.yaml` and Routing

Step 5: Update Model Identifiers

Step 6: Reconnect Observability

Exporting your LiteLLM history

Step 7 (Optional): Adopt the Responses API

Why migrate to Concentrate

What about the LiteLLM Python SDK?

Troubleshooting

Next Steps

API Reference

Available Models

Auto Routing

Get Support

Feedback

​Prerequisites

​Quick Start for Claude Code users

​Step 1: Update Your Environment Variables

​Step 2: Update Your Client

​Step 3: Remove x-litellm-* Headers

​Step 4: Decompose Your config.yaml and Routing

​Step 5: Update Model Identifiers

​Step 6: Reconnect Observability

​Exporting your LiteLLM history

​Step 7 (Optional): Adopt the Responses API

​Why migrate to Concentrate

​What about the LiteLLM Python SDK?

​Troubleshooting

​Next Steps

API Reference

Available Models

Auto Routing

Get Support

​Feedback

Prerequisites

Quick Start for Claude Code users

Step 1: Update Your Environment Variables

Step 2: Update Your Client

Step 3: Remove `x-litellm-*` Headers

Step 4: Decompose Your `config.yaml` and Routing

Step 5: Update Model Identifiers

Step 6: Reconnect Observability

Exporting your LiteLLM history

Step 7 (Optional): Adopt the Responses API

Why migrate to Concentrate

What about the LiteLLM Python SDK?

Troubleshooting

Next Steps

Feedback