Skip to main content
POST
/
v1
/
responses
/
cURL
curl --request POST \
  --url https://api.concentrate.ai/v1/responses/ \
  --header 'Content-Type: application/json' \
  --data '
{
  "input": "<string>",
  "model": "<string>",
  "include": [],
  "instructions": "<string>",
  "max_output_tokens": 4503599627370495,
  "metadata": {},
  "reasoning": {},
  "stream": true,
  "temperature": 1,
  "text": {
    "format": "<unknown>",
    "verbosity": "medium"
  },
  "tools": [
    "<unknown>"
  ],
  "tool_choice": "<unknown>",
  "top_p": 0.5,
  "parallel_tool_calls": true,
  "previous_response_id": "<string>",
  "prompt_cache_key": "<string>",
  "top_logprobs": 10,
  "background": true,
  "context_management": [
    {
      "type": "compaction",
      "compact_threshold": 4503599627370996
    }
  ],
  "conversation": "<string>",
  "max_tool_calls": 4503599627370495,
  "prompt": {
    "id": "<string>",
    "version": "<string>",
    "variables": {}
  },
  "safety_identifier": "<string>",
  "store": true,
  "stream_options": {
    "include_obfuscation": true
  },
  "user": "<string>",
  "routing": {
    "model": {
      "fallbacks": [
        "<string>"
      ],
      "sort": "performance"
    },
    "provider": {
      "fallbacks": [
        "<string>"
      ],
      "sort": "performance",
      "interval": "<string>"
    }
  },
  "cache_control": "<unknown>"
}
'
{
  "id": "<string>",
  "model": "<string>",
  "object": "response",
  "output": [
    "<unknown>"
  ],
  "usage": {
    "input_tokens": 1,
    "input_tokens_details": {
      "cached_tokens": 1,
      "cached_tokens_created": 1
    },
    "output_tokens": 1,
    "output_tokens_details": {
      "reasoning_tokens": 1
    },
    "total_tokens": 1,
    "tool_calls": {
      "web_search": 1
    }
  },
  "error": {
    "code": "<string>",
    "message": "<string>"
  },
  "frequency_penalty": 123,
  "incomplete_details": {},
  "instructions": "<string>",
  "metadata": {},
  "temperature": 1,
  "tool_choice": "<unknown>",
  "tools": [
    "<unknown>"
  ],
  "top_p": 0.5,
  "background": true,
  "completed_at": 0,
  "created_at": 0,
  "conversation": {
    "id": "<string>"
  },
  "max_output_tokens": 0,
  "max_tool_calls": 0,
  "parallel_tool_calls": true,
  "presence_penalty": 123,
  "previous_response_id": "<string>",
  "prompt": {
    "id": "<string>",
    "version": "<string>",
    "variables": {}
  },
  "prompt_cache_key": "<string>",
  "reasoning": {},
  "safety_identifier": "<string>",
  "store": true,
  "text": {
    "format": "<unknown>",
    "verbosity": "medium"
  },
  "top_logprobs": 10,
  "user": "<string>",
  "cost": {
    "total": 123
  },
  "redact": {
    "entities_found": 123,
    "entity_types": [
      "<string>"
    ],
    "models_used": [
      "<string>"
    ],
    "execution_time_ms": 123,
    "redaction_coverage": 123,
    "redacted_input": [
      "<unknown>"
    ],
    "redacted_tools": [
      "<unknown>"
    ]
  }
}

Overview

The main endpoint for generating AI responses. Supports both streaming and non-streaming modes, with automatic normalization across all providers.

Guardrails

Redaction guardrails are configured on your API key (not in this endpoint body). When enabled, they are applied automatically for requests made with that key. See Guardrails & Redaction.

Body

application/json
input
required

Text, image, or file inputs to the model, used to generate a response.

Minimum string length: 1
model
string
required

Model identifier. Use /v1/models to list all available models. Supports canonical names (e.g. gpt-5.2, claude-opus-4-6), aliases, and provider-prefixed formats (e.g. openai/gpt-5.2). Use "auto" for automatic model selection.

include
enum<string>[] | null

Specify additional output data to include in the model response.

Maximum array length: 8
Available options:
web_search_call.results,
web_search_call.action.sources,
message.output_text.logprobs,
message.input_image.image_url,
reasoning.encrypted_content,
file_search_call.results,
computer_call_output.output.image_url,
code_interpreter_call.outputs
instructions
string | null

A system (or developer) message inserted into the model's context. When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.

max_output_tokens
integer | null

An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

Required range: 0 < x <= 9007199254740991
metadata
object

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

reasoning
object

Configuration options for reasoning models.

stream
boolean | null

If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

temperature
number | null

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

Required range: 0 <= x <= 2
text
object

Configuration options for a text response from the model. Can be plain text or structured JSON data.

tools
any[] | null

An array of tools the model may call while generating a response. You can specify which tool to use by setting the tool_choice parameter.

tool_choice
any
top_p
number | null

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

Required range: 0 <= x <= 1
parallel_tool_calls
boolean | null

Whether to allow the model to run tool calls in parallel.

previous_response_id
string | null

The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used in conjunction with conversation. Concentrate enables this for all models. In order to be used, request logging must be enabled. Learn more.

prompt_cache_key
string | null

Used to cache responses for similar requests to optimize your cache hit rates. Replaces the user field. If prompt_cache_key or user is not set, Concentrate will automatically add a prompt cache key based on your API key.

prompt_cache_retention
enum<string> | null

The retention policy for the prompt cache. Set to 24h to enable extended prompt caching, which keeps cached prefixes active for longer, up to a maximum of 24 hours. Has no effect on explicit caching, which must be set through cache_control.

Available options:
in-memory,
in_memory,
24h
top_logprobs
integer | null

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.

Required range: 0 <= x <= 20
background
boolean | null

Whether to run the model response in the background. Unsupported, but included for compatibility.

context_management
object[] | null

Configuration for how the model's context window is managed during a response, such as automatic compaction of older turns. Currently unsupported.

conversation

The conversation that this response belongs to. Items from this conversation are prepended to input_items for this response request. Cannot be used in conjunction with previous_response_id. Currently unsupported, but included for compatibility. Use previous_response_id instead.

max_tool_calls
integer | null

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

Required range: 0 < x <= 9007199254740991
prompt
object

Reference to a prompt template and its variables. Currently unsupported, but included for compatibility.

safety_identifier
string | null

A stable identifier used to help detect users of your application that may be violating usage policies. The IDs should be a string that uniquely identifies each user. We recommend hashing their username or email address, in order to avoid sending us any identifying information. Unsupported, as Concentrate reserves this field.

service_tier
enum<string> | null

Specifies the processing type used for serving the request. Determines the pricing and performance tier used to process the request. When not set, the default behavior is auto. Currently unsupported, but included for compatibility.

Available options:
auto,
default,
flex,
scale,
priority
store
boolean | null

Whether to store the generated model response for later retrieval via API.

stream_options
object

Options for streaming responses. Only set this when you set stream: true.

truncation
enum<string> | null

The truncation strategy to use for the model response. auto: if the input exceeds the model's context window size, the model truncates the response by dropping items from the beginning of the conversation. disabled (default): if the input size exceeds the context window size for a model, the request fails with a 400 error. Currently unsupported, but included for compatibility.

Available options:
auto,
disabled
user
string | null

This field is being replaced by safety_identifier and prompt_cache_key. We recommend using prompt_cache_key instead to maintain caching optimizations. A stable identifier for your end-users. Used to boost cache hit rates by better bucketing similar requests and to help detect and prevent abuse. Using this as a safety identifier has no effect, but this value will be used for prompt_cache_key instead of a value based on your API key if provided.

routing
object

Concentrate routing configuration controlling how requests are routed across models and providers. Learn more about routing.

cache_control
any

Response

Default Response

id
string
required
model
string
required
object
enum<string>
default:response
required
Available options:
response
output
any[]
required
usage
object
required
error
object
frequency_penalty
number | null
incomplete_details
object
instructions
metadata
object

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

temperature
number | null

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

Required range: 0 <= x <= 2
tool_choice
any
tools
any[] | null
top_p
number | null

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.

Required range: 0 <= x <= 1
background
boolean | null
completed_at
integer | null
Required range: -9007199254740991 <= x <= 9007199254740991
created_at
integer | null
Required range: -9007199254740991 <= x <= 9007199254740991
conversation
object
max_output_tokens
integer | null
Required range: -9007199254740991 <= x <= 9007199254740991
max_tool_calls
integer | null
Required range: -9007199254740991 <= x <= 9007199254740991
parallel_tool_calls
boolean | null
presence_penalty
number | null
previous_response_id
string | null
prompt
object

Reference to a prompt template and its variables. Currently unsupported, but included for compatibility.

prompt_cache_key
string | null
prompt_cache_retention
enum<string> | null

The retention policy for the prompt cache. Set to 24h to enable extended prompt caching, which keeps cached prefixes active for longer, up to a maximum of 24 hours. Has no effect on explicit caching, which must be set through cache_control.

Available options:
in-memory,
in_memory,
24h
reasoning
object

Configuration options for reasoning models.

safety_identifier
string | null
service_tier
enum<string> | null

Specifies the processing type used for serving the request. Determines the pricing and performance tier used to process the request. When not set, the default behavior is auto. Currently unsupported, but included for compatibility.

Available options:
auto,
default,
flex,
scale,
priority
status

The status of the item. One of in_progress, completed, or incomplete.

Available options:
completed,
in_progress,
incomplete
store
boolean | null
text
object

Configuration options for a text response from the model. Can be plain text or structured JSON data.

top_logprobs
integer | null

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.

Required range: 0 <= x <= 20
truncation
enum<string> | null

The truncation strategy to use for the model response. auto: if the input exceeds the model's context window size, the model truncates the response by dropping items from the beginning of the conversation. disabled (default): if the input size exceeds the context window size for a model, the request fails with a 400 error. Currently unsupported, but included for compatibility.

Available options:
auto,
disabled
user
string | null
cost
object
redact
object