Skip to main content

Documentation Index

Fetch the complete documentation index at: https://concentrate.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Streaming enables real-time response generation using Server-Sent Events (SSE). Instead of waiting for the complete response, you receive content incrementally as it’s generated, providing a better user experience.

How It Works

When you set stream: true, the API:
  1. Returns an SSE stream instead of JSON
  2. Sends events incrementally as content is generated
  3. Each event contains a JSON payload with type and data
  4. Closes the connection when complete

Enable Streaming

Add "stream": true to your request:
curl https://api.concentrate.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-5.2",
    "input": "Write a short story about a robot",
    "stream": true
  }'

Event Types

The stream emits different event types as content is generated:

response.created

Sent at the beginning of the response stream.
{
  "type": "response.created",
  "sequence_number": 0,
  "response": {
    "id": "resp_abc123",
    "object": "response",
    "created_at": 1702934400,
    "status": "in_progress",
    "model": "openai/gpt-5.2",
    "output": [],
    "usage": null
  }
}

response.in_progress

Sent periodically during response generation to indicate progress.
{
  "type": "response.in_progress",
  "sequence_number": 3,
  "response": {
    "id": "resp_abc123",
    "object": "response",
    "created_at": 1702934400,
    "status": "in_progress",
    "model": "openai/gpt-5.2",
    "output": [...],
    "usage": null
  }
}

response.output_item.added

Signals the start of a new output item (message or reasoning block).
{
  "type": "response.output_item.added",
  "sequence_number": 1,
  "output_index": 0,
  "item": {
    "type": "message",
    "id": "msg_xyz789",
    "status": "in_progress",
    "role": "assistant",
    "content": []
  }
}

response.content_part.added

Signals the start of a new content part within an output item.
{
  "type": "response.content_part.added",
  "sequence_number": 2,
  "output_index": 0,
  "content_index": 0,
  "item_id": "msg_xyz789",
  "part": {
    "type": "output_text",
    "text": ""
  }
}

response.output_text.delta

Contains incremental text content. This is where the actual text appears.
{
  "type": "response.output_text.delta",
  "sequence_number": 3,
  "output_index": 0,
  "content_index": 0,
  "item_id": "msg_xyz789",
  "delta": "Once upon a time"
}

response.output_text.done

Sent when a text content part is complete.
{
  "type": "response.output_text.done",
  "sequence_number": 4,
  "output_index": 0,
  "content_index": 0,
  "item_id": "msg_xyz789",
  "text": "Once upon a time there was a robot..."
}

response.content_part.done

Signals the completion of a content part.
{
  "type": "response.content_part.done",
  "sequence_number": 5,
  "output_index": 0,
  "content_index": 0,
  "item_id": "msg_xyz789",
  "part": {
    "type": "output_text",
    "text": "Once upon a time there was a robot..."
  }
}

response.output_item.done

Signals the completion of an output item.
{
  "type": "response.output_item.done",
  "sequence_number": 6,
  "output_index": 0,
  "item": {
    "type": "message",
    "id": "msg_xyz789",
    "status": "completed",
    "role": "assistant",
    "content": [
      {
        "type": "output_text",
        "text": "Once upon a time there was a robot..."
      }
    ]
  }
}

response.completed

Sent when the entire response is complete, includes final usage information.
{
  "type": "response.completed",
  "sequence_number": 7,
  "response": {
    "id": "resp_abc123",
    "object": "response",
    "created_at": 1702934400,
    "status": "completed",
    "model": "openai/gpt-5.2",
    "output": [...],
    "usage": {
      "input_tokens": 12,
      "input_tokens_details": {
        "cached_tokens": 0
      },
      "output_tokens": 156,
      "output_tokens_details": {
        "reasoning_tokens": 0
      },
      "total_tokens": 168
    }
  }
}

response.failed

Sent when the response generation fails due to an error.
{
  "type": "response.failed",
  "sequence_number": 5,
  "response": {
    "id": "resp_abc123",
    "object": "response",
    "created_at": 1702934400,
    "status": "failed",
    "model": "openai/gpt-5.2",
    "output": [...],
    "error": {
      "code": "provider_error",
      "message": "Provider openai/gpt-5.2 became unavailable"
    },
    "usage": {
      "input_tokens": 12,
      "input_tokens_details": {
        "cached_tokens": 0
      },
      "output_tokens": 45,
      "output_tokens_details": {
        "reasoning_tokens": 0
      },
      "total_tokens": 57
    }
  }
}

response.incomplete

Sent when the response generation stops before completion (e.g., due to token limits or content filters).
{
  "type": "response.incomplete",
  "sequence_number": 8,
  "response": {
    "id": "resp_abc123",
    "object": "response",
    "created_at": 1702934400,
    "status": "incomplete",
    "model": "openai/gpt-5.2",
    "output": [...],
    "incomplete_details": {
      "reason": "max_output_tokens"
    },
    "usage": {
      "input_tokens": 12,
      "input_tokens_details": {
        "cached_tokens": 0
      },
      "output_tokens": 1000,
      "output_tokens_details": {
        "reasoning_tokens": 0
      },
      "total_tokens": 1012
    }
  }
}

error

Sent when an error occurs during streaming.
{
  "type": "error",
  "sequence_number": 5,
  "code": "provider_error",
  "message": "Provider openai/gpt-5.2 became unavailable",
  "param": null
}

Reasoning Model Events

For models with reasoning capabilities (e.g., o1, command-a-reasoning), additional event types are emitted: response.reasoning_summary_part.added - Signals the start of a new reasoning summary part response.reasoning_summary_part.done - Signals the completion of a reasoning summary part response.reasoning_summary_text.delta - Incremental reasoning summary text response.reasoning_summary_text.done - Completed reasoning summary response.reasoning_text.delta - Incremental detailed reasoning text response.reasoning_text.done - Completed detailed reasoning Example reasoning summary part event:
{
  "type": "response.reasoning_summary_part.added",
  "sequence_number": 4,
  "output_index": 0,
  "summary_index": 0,
  "item_id": "reason_xyz123",
  "part": {
    "type": "summary_text",
    "text": ""
  }
}
Example reasoning summary text delta:
{
  "type": "response.reasoning_summary_text.delta",
  "sequence_number": 5,
  "output_index": 0,
  "summary_index": 0,
  "item_id": "reason_xyz123",
  "delta": "Analyzing the problem"
}

Tool Calling Events

When the model calls a tool, you’ll receive these events:
  • response.function_call_arguments.delta - Incremental tool arguments (streaming)
  • response.function_call_arguments.done - Complete tool call with final arguments
Example tool calling event:
{
  "type": "response.function_call_arguments.delta",
  "sequence_number": 5,
  "item_id": "func_xyz789",
  "output_index": 0,
  "call_id": "call_abc123",
  "delta": "{\"location\": \"San"
}
See Tool Calling for complete streaming examples.

Complete Stream Example

Here’s a full example showing all events in order:
event: response.created
data: {"type":"response.created","sequence_number":0,"response":{"id":"resp_abc123","status":"in_progress"}}

event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":1,"output_index":0,"item":{"type":"message","id":"msg_xyz789","status":"in_progress","role":"assistant","content":[]}}

event: response.content_part.added
data: {"type":"response.content_part.added","sequence_number":2,"output_index":0,"content_index":0,"item_id":"msg_xyz789","part":{"type":"output_text","text":""}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":3,"output_index":0,"content_index":0,"item_id":"msg_xyz789","delta":"Once"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":4,"output_index":0,"content_index":0,"item_id":"msg_xyz789","delta":" upon"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":5,"output_index":0,"content_index":0,"item_id":"msg_xyz789","delta":" a"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":6,"output_index":0,"content_index":0,"item_id":"msg_xyz789","delta":" time"}

event: response.output_text.done
data: {"type":"response.output_text.done","sequence_number":7,"output_index":0,"content_index":0,"item_id":"msg_xyz789","text":"Once upon a time"}

event: response.content_part.done
data: {"type":"response.content_part.done","sequence_number":8,"output_index":0,"content_index":0,"item_id":"msg_xyz789"}

event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":9,"output_index":0,"item":{"type":"message","id":"msg_xyz789","status":"completed"}}

event: response.completed
data: {"type":"response.completed","sequence_number":10,"response":{"id":"resp_abc123","status":"completed","usage":{"input_tokens":8,"output_tokens":4,"total_tokens":12}}}

Client Disconnect Handling

The API automatically detects when clients disconnect and aborts the request:
  • No additional charges for tokens after disconnect
  • Resources are immediately freed
  • Generation stops as soon as disconnect is detected
Client disconnects are handled gracefully. You will only be charged for tokens generated before the disconnect.

Error Handling in Streams

Errors during streaming are sent as special error events:
{
  "type": "error",
  "sequence_number": 5,
  "code": "provider_error",
  "message": "Provider openai/gpt-5.2 became unavailable",
  "param": null
}
Common error codes:
  • provider_error: The upstream provider failed
  • rate_limit_error: Rate limit exceeded
  • insufficient_credits: Not enough credits to complete

Best Practices

SSE events may arrive in partial chunks. Buffer incomplete JSON before parsing:
let buffer = '';

for (const chunk of chunks) {
  buffer += chunk;
  const lines = buffer.split('\n');
  buffer = lines.pop(); // Keep incomplete line in buffer

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      // Process event
    }
  }
}
Implement exponential backoff for reconnection on network failures:
import time

max_retries = 3
retry_delay = 1

for attempt in range(max_retries):
    try:
        # Attempt streaming request
        break
    except Exception as e:
        if attempt < max_retries - 1:
            time.sleep(retry_delay * (2 ** attempt))
        else:
            raise
For the best user experience, render content as it arrives:
let fullText = '';

for await (const event of streamEvents) {
  if (event.type === 'response.output_text.delta') {
    fullText += event.delta;
    updateUI(fullText); // Update display in real-time
  }
}
Configure appropriate timeouts for long-running streams:
response = requests.post(
    url,
    json=payload,
    stream=True,
    timeout=(5, 60)  # (connect timeout, read timeout)
)
Track tokens in real-time to implement custom limits:
let tokenCount = 0;
const MAX_TOKENS = 1000;

for await (const event of streamEvents) {
  if (event.type === 'response.completed' && event.response.usage) {
    tokenCount = event.response.usage.total_tokens;
    if (tokenCount > MAX_TOKENS) {
      // Warn user or take action
    }
  }
}

Framework Examples

import { useState, useEffect } from 'react';

function StreamingChat() {
  const [content, setContent] = useState('');
  const [loading, setLoading] = useState(false);

  const streamMessage = async (input: string) => {
    setLoading(true);
    setContent('');

    const response = await fetch('https://api.concentrate.ai/v1/responses', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: 'gpt-5.2',
        input,
        stream: true
      })
    });

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n');

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const event = JSON.parse(line.slice(6));

          if (event.type === 'response.output_text.delta') {
            setContent(prev => prev + event.delta);
          }
        }
      }
    }

    setLoading(false);
  };

  return (
    <div>
      <div>{content}</div>
      {loading && <div>Generating...</div>}
    </div>
  );
}
const express = require('express');
const fetch = require('node-fetch');

const app = express();

app.get('/stream', async (req, res) => {
  const response = await fetch('https://api.concentrate.ai/v1/responses', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-5.2',
      input: req.query.input,
      stream: true
    })
  });

  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  response.body.pipe(res);
});
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import httpx

app = FastAPI()

@app.get("/stream")
async def stream_response(input: str):
    async def generate():
        async with httpx.AsyncClient() as client:
            async with client.stream(
                "POST",
                "https://api.concentrate.ai/v1/responses",
                headers={
                    "Authorization": "Bearer YOUR_API_KEY",
                    "Content-Type": "application/json"
                },
                json={
                    "model": "gpt-5.2",
                    "input": input,
                    "stream": True
                }
            ) as response:
                async for chunk in response.aiter_bytes():
                    yield chunk

    return StreamingResponse(
        generate(),
        media_type="text/event-stream"
    )

Create Response

Main API endpoint documentation

Error Handling

Handle errors in streams