Documentation Index
Fetch the complete documentation index at: https://concentrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This page provides a comprehensive reference for all parameters you can use when creating responses. Parameters are organized by category for easy navigation.Required Parameters
model
The AI model to use for generating the response.Format Options:See the Model Fortress on the app for a full list.
- Model name only:
"gpt-5.2"- Automatic provider routing - Provider-prefixed:
"openai/gpt-5.2"- Specific provider - Auto routing:
"auto"- Intelligent selection based on criteria
input
The input to send to the model. Can be either a simple string or an array of message/tool objects for conversations.String Format:Conversation Format:Array Item Types:The input array can contain the following types of objects:1. Message ObjectsStandard conversation messages:Properties:Properties:See Tool Calling Guide for complete workflow examples.
type(optional): “message” (default)role(required): “user”, “assistant”, “system”, or “developer”content(required): String or array of content blocks (e.g.,[{ "type": "input_text", "text": "..." }]or[{ "type": "input_image", "image_url": "..." }]). See Multi-Modal Inputs for image support.cache_control(optional): Cache control settings (see Prompt Caching)
type(required): “function_call”call_id(required): Unique identifier for this function callname(required): Function name that was calledarguments(required): JSON string of the function argumentsstatus(optional): “completed”, “in_progress”, or “incomplete”cache_control(optional): Cache control settings
type(required): “function_call_output”call_id(required): Must match the call_id from the function_calloutput(required): String or array containing the function resultis_error(optional): Boolean indicating if the function execution failed
Output Control Parameters
text
Configure the format of the model’s text output, including structured output.Properties:See Structured Output for complete documentation and examples.
format(required): Object controlling the output formattype(required):"text"|"json_schema"|"json_object"name(required for json_schema): Schema nameschema(required for json_schema): JSON Schema objectdescription(optional): Description of the expected outputstrict(optional): Enable strict schema enforcement
max_output_tokens
Maximum number of tokens to generate in the response.Important Notes:Model Limits:
- If not specified, uses the model’s default limit or your credit limit (whichever is lower)
- Highly recommended to set this to avoid unexpectedly long and expensive responses
- Different models have different maximum output token limits
| Model | Max Output Tokens |
|---|---|
| GPT-5.2 | 16,384 |
| Claude Opus 4.5 | 16,384 |
| Claude Sonnet 4.5 | 16,384 |
| Gemini 2.5 Pro | 65,536 |
| o1 | 100,000 |
Sampling Parameters
These parameters control the randomness and creativity of model outputs.temperature
Controls randomness in the output. Range: 0.0 to 2.0Values:
- 0.0 - 0.3: Very focused and deterministic
- Use for: Code generation, factual tasks, data extraction
- 0.4 - 0.7: Balanced creativity and coherence
- Use for: General conversation, Q&A, explanations
- 0.8 - 1.2: Creative and varied
- Use for: Creative writing, brainstorming, storytelling
- 1.3 - 2.0: Highly random and experimental
- Use for: Highly creative tasks, unconventional ideas
top_p
Nucleus sampling parameter. Range: 0.0 to 1.0How it works:
- Controls diversity by limiting token selection to the top probability mass
- Alternative to temperature for controlling randomness
- Lower values = more focused, higher values = more diverse
- 0.1 - 0.3: Very focused outputs
- 0.4 - 0.7: Balanced outputs
- 0.8 - 1.0: Diverse outputs
Generally, only use one of
temperature or top_p at a time, not both. If you specify both, temperature typically takes precedence depending on the model.Streaming
stream
Enable real-time streaming of the response using Server-Sent Events (SSE).When to use:See Streaming Documentation for complete implementation details.
- ✅ Chat interfaces
- ✅ Long-form content generation
- ✅ When user experience matters
- ✅ Progressive display of results
- ❌ Batch processing
- ❌ API integrations where full response is needed
- ❌ Simple programmatic tasks
Advanced Features
reasoning
Enable and configure reasoning mode for models that support it (e.g., o1, command-a-reasoning).Properties:Effort Levels:
effort(required): “low” | “medium” | “high” - Amount of reasoning effort to apply
- low: Basic reasoning, faster, lower cost
- medium: Balanced reasoning and speed
-
high: Deep reasoning, slower, higher cost (more reasoning tokens)
Reasoning tokens are counted separately in usage statistics and may be priced differently than regular output tokens.
routing
Routing configuration for provider selection, fallback models, and optimization.Properties:Complete Examples:See Routing Documentation for the full guide.
How providers are sorted and selected:Static Metrics:
"cost"— Sort by provider pricing (cheapest first)"performance"— Sort by quality/revenue-share tier (best first, default)
"avg_latency"— Average response time"min_latency","max_latency"— Min/max response time"p50_latency","p90_latency","p99_latency"— Percentile latencies"avg_e2e_latency","min_e2e_latency","max_e2e_latency"— End-to-end latency including overhead
- Format:
"p75_latency","p85_latency","p50_e2e_latency","p99_e2e_latency", etc.
"uptime"— Provider availability percentage"throughput"— Requests per second"total_requests"— Total request volume
"input_tokens","output_tokens","total_tokens"— Average token counts
Time window for live metric calculation. Ignored for static metrics (
cost, performance).Format: "number unit" or "number<shorthand>"Valid Units:minutesorm— “15 minutes”, “30 minutes”, “15m”, “30m”hoursorh— “1 hour”, “6 hours”, “24 hours”, “1h”, “6h”, “24h”daysord— “7 days”, “30 days”, “7d”, “30d”weeksorw— “1 week”, “4 weeks”, “1w”, “4w”yearsory— “1 year”, “1y”
Minimum interval is 15 minutes.
Fallback models tried after the primary model’s providers are exhausted. Accepts model slugs,
provider/model format, and "auto".Whitelist of providers. When set, only these providers are considered. Omit to allow all.
Guardrails (API Key Policy)
Guardrails are configured at the API key level, not in the/v1/responses request body.
Configure guardrails in the dashboard UI (Guardrails page) on your API key. No additional request parameter is required in
/v1/responses.Tool Calling
tools
Array of tools the model can call. Each tool is a function definition with a JSON Schema.Tool Definition:See Tool Calling Guide for complete examples.
type(required): “function” - Type of toolname(required): string - Function name (alphanumeric, underscores, dots, hyphens)description(optional): string - What the function doesparameters(required): object - JSON Schema for function parametersstrict(optional): boolean - Enable strict schema validation (default: true)cache_control(optional): object - Cache this tool definition (ephemeral, 5m or 1h TTL)
tool_choice
Control which tools the model uses.Modes:
"none"- Don’t use any tools"auto"- Let model decide (default)"required"- Force model to use at least one tool{ "type": "function", "name": "tool_name" }- Force specific tool{ "type": "allowed_tools", "mode": "auto", "tools": [...] }- Limit to specific tools
parallel_tool_calls
Enable the model to call multiple tools in parallel in a single response.
true- Model can call multiple tools simultaneously (faster for independent operations)false- Model calls one tool at a time (default for some providers)
- Multiple independent tool calls (e.g., get weather for multiple cities)
- No dependencies between tool calls
- Sequential operations where order matters
- Tool calls depend on each other’s results
Prompt Caching
cache_control
Enable prompt caching for specific messages to reduce costs on repeated prefixes.Properties:Cost Savings:
type(required): “ephemeral” - Type of cachettl(required): “5m” | “1h” - Time-to-live for the cache
- Mark messages that should be cached
- Subsequent requests with the same prefix will use cached tokens
- Cached tokens are significantly cheaper than regular input tokens
- Cache expires after the specified TTL
- Regular input tokens: Full price
- Cache write: ~25% more than input tokens (one-time cost)
- Cache read: ~90% cheaper than input tokens
- Only cache substantial prefixes (e.g., over 1000 tokens)
- Use for repeated system prompts or context
- Choose TTL based on your usage pattern:
"5m"for rapid successive requests"1h"for regular usage over longer periods
Related Documentation
Create Response
Main endpoint documentation
Auto Routing
Intelligent model selection
Streaming
Real-time response streaming
Multi-Modal
Send images to vision models
Structured Output
Force JSON responses matching a schema
Prompt Caching
Reduce costs with caching
Guardrails & Redaction
API-key-level redaction controls