Request Parameters Reference

Overview

This page provides a comprehensive reference for all parameters you can use when creating responses. Parameters are organized by category for easy navigation.

Required Parameters

model

string

required

The AI model to use for generating the response.Format Options:

Model name only: "gpt-5.2" - Automatic provider routing
Provider-prefixed: "openai/gpt-5.2" - Specific provider
Auto routing: "auto" - Intelligent selection based on criteria

Examples:

{
  "model": "gpt-5.2" // Automatic provider routing
}

{
  "model": "anthropic/claude-opus-4-5" // Specific provider
}

{
  "model": "auto", // Auto routing
  "routing": {
    "strategy": "min",
    "metric": "cost"
  }
}

See the Model Fortress on the app for a full list.

input

string | array

required

The input to send to the model. Can be either a simple string or an array of message/tool objects for conversations.String Format:

{
  "input": "What is the capital of France?"
}

Conversation Format:

{
  "input": [
    {
      "role": "system",
      "content": "You are a helpful assistant specialized in geography."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ]
}

Array Item Types:The input array can contain the following types of objects:1. Message ObjectsStandard conversation messages:

type (optional): “message” (default)
role (required): “user”, “assistant”, “system”, or “developer”
content (required): String or array of content blocks (e.g., [{ "type": "input_text", "text": "..." }] or [{ "type": "input_image", "image_url": "..." }]). See Multi-Modal Inputs for image support.
cache_control (optional): Cache control settings (see Prompt Caching)

2. Function Call ObjectsUsed when the model calls a tool and you need to continue the conversation:

{
  "type": "function_call",
  "call_id": "call_abc123",
  "name": "get_weather",
  "arguments": "{\"location\": \"San Francisco, CA\"}",
  "status": "completed"
}

Properties:

type (required): “function_call”
call_id (required): Unique identifier for this function call
name (required): Function name that was called
arguments (required): JSON string of the function arguments
status (optional): “completed”, “in_progress”, or “incomplete”
cache_control (optional): Cache control settings

3. Function Call Output ObjectsUsed to send the result of a function call back to the model:

{
  "type": "function_call_output",
  "call_id": "call_abc123",
  "output": "{\"temperature\": 72, \"conditions\": \"sunny\"}",
  "is_error": false
}

Properties:

type (required): “function_call_output”
call_id (required): Must match the call_id from the function_call
output (required): String or array containing the function result
is_error (optional): Boolean indicating if the function execution failed

Multi-Turn Tool Calling Example:

{
  "model": "gpt-5.2",
  "input": [
    {
      "role": "user",
      "content": "What's the weather in San Francisco?"
    },
    {
      "type": "function_call",
      "call_id": "call_abc123",
      "name": "get_weather",
      "arguments": "{\"location\": \"San Francisco, CA\"}"
    },
    {
      "type": "function_call_output",
      "call_id": "call_abc123",
      "output": "{\"temperature\": 72, \"conditions\": \"sunny\"}"
    }
  ],
  "tools": [...]
}

See Tool Calling Guide for complete workflow examples.

Output Control Parameters

text

object

Configure the format of the model’s text output, including structured output.Properties:

format (required): Object controlling the output format
- type (required): "text" | "json_schema" | "json_object"
- name (required for json_schema): Schema name
- schema (required for json_schema): JSON Schema object
- description (optional): Description of the expected output
- strict (optional): Enable strict schema enforcement

Example:

{
  "model": "gpt-5.2",
  "input": "Extract the person's name and age",
  "text": {
    "format": {
      "type": "json_schema",
      "name": "person",
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age": { "type": "integer" }
        },
        "required": ["name", "age"],
        "additionalProperties": false
      }
    }
  }
}

See Structured Output for complete documentation and examples.

max_output_tokens

integer

Maximum number of tokens to generate in the response.Important Notes:

If not specified, uses the model’s default limit or your credit limit (whichever is lower)
Highly recommended to set this to avoid unexpectedly long and expensive responses
Different models have different maximum output token limits

Examples:

{
  "model": "gpt-5.2",
  "input": "Write a short story",
  "max_output_tokens": 500
}

Model Limits:

Model	Max Output Tokens
GPT-5.2	16,384
Claude Opus 4.5	16,384
Claude Sonnet 4.5	16,384
Gemini 2.5 Pro	65,536
o1	100,000

Sampling Parameters

These parameters control the randomness and creativity of model outputs.

temperature

number

Controls randomness in the output. Range: 0.0 to 2.0Values:

0.0 - 0.3: Very focused and deterministic
- Use for: Code generation, factual tasks, data extraction
0.4 - 0.7: Balanced creativity and coherence
- Use for: General conversation, Q&A, explanations
0.8 - 1.2: Creative and varied
- Use for: Creative writing, brainstorming, storytelling
1.3 - 2.0: Highly random and experimental
- Use for: Highly creative tasks, unconventional ideas

Examples:

// Factual, deterministic output
{
  "model": "gpt-5.2",
  "input": "Write a function to sort an array",
  "temperature": 0.2
}

// Creative writing
{
  "model": "claude-opus-4-5",
  "input": "Write a short story about a robot",
  "temperature": 0.9
}

Temperatures above 1.5 can produce incoherent or nonsensical outputs. Use with caution.

top_p

number

Nucleus sampling parameter. Range: 0.0 to 1.0How it works:

Controls diversity by limiting token selection to the top probability mass
Alternative to temperature for controlling randomness
Lower values = more focused, higher values = more diverse

Recommended Usage:

0.1 - 0.3: Very focused outputs
0.4 - 0.7: Balanced outputs
0.8 - 1.0: Diverse outputs

Example:

{
  "model": "gpt-5.2",
  "input": "Suggest product names",
  "top_p": 0.9 // More diverse suggestions
}

Generally, only use one of temperature or top_p at a time, not both. If you specify both, temperature typically takes precedence depending on the model.

Streaming

stream

boolean

default:false

Enable real-time streaming of the response using Server-Sent Events (SSE).When to use:

✅ Chat interfaces
✅ Long-form content generation
✅ When user experience matters
✅ Progressive display of results

When not to use:

❌ Batch processing
❌ API integrations where full response is needed
❌ Simple programmatic tasks

Example:

{
  "model": "gpt-5.2",
  "input": "Write a long essay on AI",
  "stream": true
}

See Streaming Documentation for complete implementation details.

Advanced Features

reasoning

object

Enable and configure reasoning mode for models that support it (e.g., o1, command-a-reasoning).Properties:

effort (required): “low” | “medium” | “high” - Amount of reasoning effort to apply

Example:

{
  "model": "openai/o1",
  "input": "Solve this complex math problem: ...",
  "reasoning": {
    "effort": "high"
  }
}

Effort Levels:

low: Basic reasoning, faster, lower cost
medium: Balanced reasoning and speed
high: Deep reasoning, slower, higher cost (more reasoning tokens)
Reasoning tokens are counted separately in usage statistics and may be priced differently than regular output tokens.

routing

object

Routing configuration for provider selection, fallback models, and optimization.Properties:

routing.metric

string

default:"performance"

How providers are sorted and selected:Static Metrics:

"cost" — Sort by provider pricing (cheapest first)
"performance" — Sort by quality/revenue-share tier (best first, default)

Live Metrics (from Redis over the configured interval):Latency:

"avg_latency" — Average response time
"min_latency", "max_latency" — Min/max response time
"p50_latency", "p90_latency", "p99_latency" — Percentile latencies
"avg_e2e_latency", "min_e2e_latency", "max_e2e_latency" — End-to-end latency including overhead

You can use any percentile from p0 to p100 for both latency and e2e_latency:

Format: "p75_latency", "p85_latency", "p50_e2e_latency", "p99_e2e_latency", etc.

Reliability & Volume:

"uptime" — Provider availability percentage
"throughput" — Requests per second
"total_requests" — Total request volume

Token Metrics:

"input_tokens", "output_tokens", "total_tokens" — Average token counts

routing.interval

string

default:"15 minutes"

Time window for live metric calculation. Ignored for static metrics (cost, performance).Format: "number unit" or "number<shorthand>"Valid Units:

minutes or m — “15 minutes”, “30 minutes”, “15m”, “30m”
hours or h — “1 hour”, “6 hours”, “24 hours”, “1h”, “6h”, “24h”
days or d — “7 days”, “30 days”, “7d”, “30d”
weeks or w — “1 week”, “4 weeks”, “1w”, “4w”
years or y — “1 year”, “1y”

Minimum interval is 15 minutes.

routing.models

array

Fallback models tried after the primary model’s providers are exhausted. Accepts model slugs, provider/model format, and "auto".

{ "routing": { "models": ["claude-sonnet-4-20250514", "auto"] } }

routing.providers

array

Whitelist of providers. When set, only these providers are considered. Omit to allow all.

{ "routing": { "providers": ["openai", "azure"] } }

Complete Examples:

// Optimize for cost with fallback models
{
  "model": "gpt-4o",
  "input": "Summarize this text",
  "routing": {
    "metric": "cost",
    "models": ["gemini-2.0-flash"]
  }
}

// Performance-optimized (default behavior)
{
  "model": "auto",
  "input": "Complex analysis task",
  "routing": {
    "metric": "performance",
    "interval": "1 hour"
  }
}

// Low-latency with provider restriction
{
  "model": "auto",
  "input": "Quick question",
  "routing": {
    "metric": "p99_latency",
    "interval": "15 minutes",
    "providers": ["openai", "anthropic"]
  }
}

See Routing Documentation for the full guide.

Guardrails (API Key Policy)

Guardrails are configured at the API key level, not in the /v1/responses request body.

Configure guardrails in the dashboard UI (Guardrails page) on your API key. No additional request parameter is required in /v1/responses.

See Guardrails & Redaction for setup and behavior.

Tool Calling

tools

array

Array of tools the model can call. Each tool is a function definition with a JSON Schema.Tool Definition:

type (required): “function” - Type of tool
name (required): string - Function name (alphanumeric, underscores, dots, hyphens)
description (optional): string - What the function does
parameters (required): object - JSON Schema for function parameters
strict (optional): boolean - Enable strict schema validation (default: true)
cache_control (optional): object - Cache this tool definition (ephemeral, 5m or 1h TTL)

Example:

{
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City and state, e.g. San Francisco, CA"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"]
          }
        },
        "required": ["location"]
      }
    }
  ]
}

See Tool Calling Guide for complete examples.

tool_choice

string | object

Control which tools the model uses.Modes:

"none" - Don’t use any tools
"auto" - Let model decide (default)
"required" - Force model to use at least one tool
{ "type": "function", "name": "tool_name" } - Force specific tool
{ "type": "allowed_tools", "mode": "auto", "tools": [...] } - Limit to specific tools

Examples:

// Auto mode (default)
{ "tool_choice": "auto" }

// Force specific tool
{
  "tool_choice": {
    "type": "function",
    "name": "get_weather"
  }
}

// Allowed tools
{
  "tool_choice": {
    "type": "allowed_tools",
    "mode": "required",
    "tools": [
      { "type": "function", "name": "get_weather" },
      { "type": "function", "name": "get_forecast" }
    ]
  }
}

parallel_tool_calls

boolean

Enable the model to call multiple tools in parallel in a single response.

true - Model can call multiple tools simultaneously (faster for independent operations)
false - Model calls one tool at a time (default for some providers)

When to enable:

Multiple independent tool calls (e.g., get weather for multiple cities)
No dependencies between tool calls

When to disable:

Sequential operations where order matters
Tool calls depend on each other’s results

Prompt Caching

cache_control

object

Enable prompt caching for specific messages to reduce costs on repeated prefixes.

Currently supported by:

Anthropic provider (Claude models via Anthropic API)
AWS Bedrock provider (Claude models via AWS Bedrock)

All other providers will ignore cache_control settings.

Properties:

type (required): “ephemeral” - Type of cache
ttl (required): “5m” | “1h” - Time-to-live for the cache

How it works:

Mark messages that should be cached
Subsequent requests with the same prefix will use cached tokens
Cached tokens are significantly cheaper than regular input tokens
Cache expires after the specified TTL

Example:

{
  "model": "anthropic/claude-opus-4-5",
  "input": [
    {
      "role": "system",
      "content": "Very long system prompt with documentation...",
      "cache_control": {
        "type": "ephemeral",
        "ttl": "5m"
      }
    },
    {
      "role": "user",
      "content": "Question based on the documentation"
    }
  ]
}

Cost Savings:

Regular input tokens: Full price
Cache write: ~25% more than input tokens (one-time cost)
Cache read: ~90% cheaper than input tokens

Best Practices:

Only cache substantial prefixes (e.g., over 1000 tokens)
Use for repeated system prompts or context
Choose TTL based on your usage pattern:
- "5m" for rapid successive requests
- "1h" for regular usage over longer periods

Create Response

Main endpoint documentation

Auto Routing

Intelligent model selection

Streaming

Real-time response streaming

Multi-Modal

Send images to vision models

Structured Output

Force JSON responses matching a schema

Prompt Caching

Reduce costs with caching

Guardrails & Redaction

API-key-level redaction controls

API documentation

Responses

Chat Completions (Beta)

Messages (Beta)

Models

Utilities

Features

Reference

Request Parameters Reference

Overview

Required Parameters

model

input

Output Control Parameters

text

max_output_tokens

Sampling Parameters

temperature

top_p

Streaming

stream

Advanced Features

reasoning

routing

Guardrails (API Key Policy)

Tool Calling

tools

tool_choice

parallel_tool_calls

Prompt Caching

cache_control

Create Response

Auto Routing

Streaming

Multi-Modal

Structured Output

Prompt Caching

Guardrails & Redaction

API documentation

Responses

Chat Completions (Beta)

Messages (Beta)

Models

Utilities

Features

Reference

Documentation Index

​Overview

​Required Parameters

​model

​input

​Output Control Parameters

​text

​max_output_tokens

​Sampling Parameters

​temperature

​top_p

​Streaming

​stream

​Advanced Features

​reasoning

​routing

​Guardrails (API Key Policy)

​Tool Calling

​tools

​tool_choice

​parallel_tool_calls

​Prompt Caching

​cache_control

​Related Documentation

Create Response

Auto Routing

Streaming

Multi-Modal

Structured Output

Prompt Caching

Guardrails & Redaction

Overview

Required Parameters

model

input

Output Control Parameters

text

max_output_tokens

Sampling Parameters

temperature

top_p

Streaming

stream

Advanced Features

reasoning

routing

Guardrails (API Key Policy)

Tool Calling

tools

tool_choice

parallel_tool_calls

Prompt Caching

cache_control

Related Documentation