Streaming Responses

Overview

Streaming enables real-time response generation using Server-Sent Events (SSE). Instead of waiting for the complete response, you receive content incrementally as it’s generated, providing a better user experience.

How It Works

When you set stream: true, the API:

Returns an SSE stream instead of JSON
Sends events incrementally as content is generated
Each event contains a JSON payload with type and data
Closes the connection when complete

Enable Streaming

Add "stream": true to your request:

curl https://api.concentrate.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-5.2",
    "input": "Write a short story about a robot",
    "stream": true
  }'

Event Types

The stream emits different event types as content is generated:

response.created

Sent at the beginning of the response stream.

{
  "type": "response.created",
  "sequence_number": 0,
  "response": {
    "id": "resp_abc123",
    "object": "response",
    "created_at": 1702934400,
    "status": "in_progress",
    "model": "openai/gpt-5.2",
    "output": [],
    "usage": null
  }
}

response.in_progress

Sent periodically during response generation to indicate progress.

{
  "type": "response.in_progress",
  "sequence_number": 3,
  "response": {
    "id": "resp_abc123",
    "object": "response",
    "created_at": 1702934400,
    "status": "in_progress",
    "model": "openai/gpt-5.2",
    "output": [...],
    "usage": null
  }
}

response.output_item.added

Signals the start of a new output item (message or reasoning block).

{
  "type": "response.output_item.added",
  "sequence_number": 1,
  "output_index": 0,
  "item": {
    "type": "message",
    "id": "msg_xyz789",
    "status": "in_progress",
    "role": "assistant",
    "content": []
  }
}

response.content_part.added

Signals the start of a new content part within an output item.

{
  "type": "response.content_part.added",
  "sequence_number": 2,
  "output_index": 0,
  "content_index": 0,
  "item_id": "msg_xyz789",
  "part": {
    "type": "output_text",
    "text": ""
  }
}

response.output_text.delta

Contains incremental text content. This is where the actual text appears.

{
  "type": "response.output_text.delta",
  "sequence_number": 3,
  "output_index": 0,
  "content_index": 0,
  "item_id": "msg_xyz789",
  "delta": "Once upon a time"
}

response.output_text.done

Sent when a text content part is complete.

{
  "type": "response.output_text.done",
  "sequence_number": 4,
  "output_index": 0,
  "content_index": 0,
  "item_id": "msg_xyz789",
  "text": "Once upon a time there was a robot..."
}

response.content_part.done

Signals the completion of a content part.

{
  "type": "response.content_part.done",
  "sequence_number": 5,
  "output_index": 0,
  "content_index": 0,
  "item_id": "msg_xyz789",
  "part": {
    "type": "output_text",
    "text": "Once upon a time there was a robot..."
  }
}

response.output_item.done

Signals the completion of an output item.

{
  "type": "response.output_item.done",
  "sequence_number": 6,
  "output_index": 0,
  "item": {
    "type": "message",
    "id": "msg_xyz789",
    "status": "completed",
    "role": "assistant",
    "content": [
      {
        "type": "output_text",
        "text": "Once upon a time there was a robot..."
      }
    ]
  }
}

response.completed

Sent when the entire response is complete, includes final usage information.

{
  "type": "response.completed",
  "sequence_number": 7,
  "response": {
    "id": "resp_abc123",
    "object": "response",
    "created_at": 1702934400,
    "status": "completed",
    "model": "openai/gpt-5.2",
    "output": [...],
    "usage": {
      "input_tokens": 12,
      "input_tokens_details": {
        "cached_tokens": 0
      },
      "output_tokens": 156,
      "output_tokens_details": {
        "reasoning_tokens": 0
      },
      "total_tokens": 168
    }
  }
}

response.failed

Sent when the response generation fails due to an error.

{
  "type": "response.failed",
  "sequence_number": 5,
  "response": {
    "id": "resp_abc123",
    "object": "response",
    "created_at": 1702934400,
    "status": "failed",
    "model": "openai/gpt-5.2",
    "output": [...],
    "error": {
      "code": "provider_error",
      "message": "Provider openai/gpt-5.2 became unavailable"
    },
    "usage": {
      "input_tokens": 12,
      "input_tokens_details": {
        "cached_tokens": 0
      },
      "output_tokens": 45,
      "output_tokens_details": {
        "reasoning_tokens": 0
      },
      "total_tokens": 57
    }
  }
}

response.incomplete

Sent when the response generation stops before completion (e.g., due to token limits or content filters).

{
  "type": "response.incomplete",
  "sequence_number": 8,
  "response": {
    "id": "resp_abc123",
    "object": "response",
    "created_at": 1702934400,
    "status": "incomplete",
    "model": "openai/gpt-5.2",
    "output": [...],
    "incomplete_details": {
      "reason": "max_output_tokens"
    },
    "usage": {
      "input_tokens": 12,
      "input_tokens_details": {
        "cached_tokens": 0
      },
      "output_tokens": 1000,
      "output_tokens_details": {
        "reasoning_tokens": 0
      },
      "total_tokens": 1012
    }
  }
}

error

Sent when an error occurs during streaming.

{
  "type": "error",
  "sequence_number": 5,
  "code": "provider_error",
  "message": "Provider openai/gpt-5.2 became unavailable",
  "param": null
}

Reasoning Model Events

For models with reasoning capabilities (e.g., o1, command-a-reasoning), additional event types are emitted: response.reasoning_summary_part.added - Signals the start of a new reasoning summary part response.reasoning_summary_part.done - Signals the completion of a reasoning summary part response.reasoning_summary_text.delta - Incremental reasoning summary text response.reasoning_summary_text.done - Completed reasoning summary response.reasoning_text.delta - Incremental detailed reasoning text response.reasoning_text.done - Completed detailed reasoning Example reasoning summary part event:

{
  "type": "response.reasoning_summary_part.added",
  "sequence_number": 4,
  "output_index": 0,
  "summary_index": 0,
  "item_id": "reason_xyz123",
  "part": {
    "type": "summary_text",
    "text": ""
  }
}

Example reasoning summary text delta:

{
  "type": "response.reasoning_summary_text.delta",
  "sequence_number": 5,
  "output_index": 0,
  "summary_index": 0,
  "item_id": "reason_xyz123",
  "delta": "Analyzing the problem"
}

Tool Calling Events

When the model calls a tool, you’ll receive these events:

response.function_call_arguments.delta - Incremental tool arguments (streaming)
response.function_call_arguments.done - Complete tool call with final arguments

Example tool calling event:

{
  "type": "response.function_call_arguments.delta",
  "sequence_number": 5,
  "item_id": "func_xyz789",
  "output_index": 0,
  "call_id": "call_abc123",
  "delta": "{\"location\": \"San"
}

See Tool Calling for complete streaming examples.

Complete Stream Example

Here’s a full example showing all events in order:

event: response.created
data: {"type":"response.created","sequence_number":0,"response":{"id":"resp_abc123","status":"in_progress"}}

event: response.output_item.added
data: {"type":"response.output_item.added","sequence_number":1,"output_index":0,"item":{"type":"message","id":"msg_xyz789","status":"in_progress","role":"assistant","content":[]}}

event: response.content_part.added
data: {"type":"response.content_part.added","sequence_number":2,"output_index":0,"content_index":0,"item_id":"msg_xyz789","part":{"type":"output_text","text":""}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":3,"output_index":0,"content_index":0,"item_id":"msg_xyz789","delta":"Once"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":4,"output_index":0,"content_index":0,"item_id":"msg_xyz789","delta":" upon"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":5,"output_index":0,"content_index":0,"item_id":"msg_xyz789","delta":" a"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","sequence_number":6,"output_index":0,"content_index":0,"item_id":"msg_xyz789","delta":" time"}

event: response.output_text.done
data: {"type":"response.output_text.done","sequence_number":7,"output_index":0,"content_index":0,"item_id":"msg_xyz789","text":"Once upon a time"}

event: response.content_part.done
data: {"type":"response.content_part.done","sequence_number":8,"output_index":0,"content_index":0,"item_id":"msg_xyz789"}

event: response.output_item.done
data: {"type":"response.output_item.done","sequence_number":9,"output_index":0,"item":{"type":"message","id":"msg_xyz789","status":"completed"}}

event: response.completed
data: {"type":"response.completed","sequence_number":10,"response":{"id":"resp_abc123","status":"completed","usage":{"input_tokens":8,"output_tokens":4,"total_tokens":12}}}

Client Disconnect Handling

The API automatically detects when clients disconnect and aborts the request:

No additional charges for tokens after disconnect
Resources are immediately freed
Generation stops as soon as disconnect is detected

Client disconnects are handled gracefully. You will only be charged for tokens generated before the disconnect.

Error Handling in Streams

Errors during streaming are sent as special error events:

{
  "type": "error",
  "sequence_number": 5,
  "code": "provider_error",
  "message": "Provider openai/gpt-5.2 became unavailable",
  "param": null
}

Common error codes:

provider_error: The upstream provider failed
rate_limit_error: Rate limit exceeded
insufficient_credits: Not enough credits to complete

Best Practices

Buffer partial chunks

SSE events may arrive in partial chunks. Buffer incomplete JSON before parsing:

let buffer = '';

for (const chunk of chunks) {
  buffer += chunk;
  const lines = buffer.split('\n');
  buffer = lines.pop(); // Keep incomplete line in buffer

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      // Process event
    }
  }
}

Handle reconnection

Implement exponential backoff for reconnection on network failures:

import time

max_retries = 3
retry_delay = 1

for attempt in range(max_retries):
    try:
        # Attempt streaming request
        break
    except Exception as e:
        if attempt < max_retries - 1:
            time.sleep(retry_delay * (2 ** attempt))
        else:
            raise

Display content incrementally

For the best user experience, render content as it arrives:

let fullText = '';

for await (const event of streamEvents) {
  if (event.type === 'response.output_text.delta') {
    fullText += event.delta;
    updateUI(fullText); // Update display in real-time
  }
}

Set timeouts

Configure appropriate timeouts for long-running streams:

response = requests.post(
    url,
    json=payload,
    stream=True,
    timeout=(5, 60)  # (connect timeout, read timeout)
)

Monitor token usage

Track tokens in real-time to implement custom limits:

let tokenCount = 0;
const MAX_TOKENS = 1000;

for await (const event of streamEvents) {
  if (event.type === 'response.completed' && event.response.usage) {
    tokenCount = event.response.usage.total_tokens;
    if (tokenCount > MAX_TOKENS) {
      // Warn user or take action
    }
  }
}

Framework Examples

React

import { useState, useEffect } from 'react';

function StreamingChat() {
  const [content, setContent] = useState('');
  const [loading, setLoading] = useState(false);

  const streamMessage = async (input: string) => {
    setLoading(true);
    setContent('');

    const response = await fetch('https://api.concentrate.ai/v1/responses', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: 'gpt-5.2',
        input,
        stream: true
      })
    });

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n');

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const event = JSON.parse(line.slice(6));

          if (event.type === 'response.output_text.delta') {
            setContent(prev => prev + event.delta);
          }
        }
      }
    }

    setLoading(false);
  };

  return (
    <div>
      <div>{content}</div>
      {loading && <div>Generating...</div>}
    </div>
  );
}

Express.js (Server-Side)

const express = require('express');
const fetch = require('node-fetch');

const app = express();

app.get('/stream', async (req, res) => {
  const response = await fetch('https://api.concentrate.ai/v1/responses', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-5.2',
      input: req.query.input,
      stream: true
    })
  });

  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  response.body.pipe(res);
});

FastAPI (Python)

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import httpx

app = FastAPI()

@app.get("/stream")
async def stream_response(input: str):
    async def generate():
        async with httpx.AsyncClient() as client:
            async with client.stream(
                "POST",
                "https://api.concentrate.ai/v1/responses",
                headers={
                    "Authorization": "Bearer YOUR_API_KEY",
                    "Content-Type": "application/json"
                },
                json={
                    "model": "gpt-5.2",
                    "input": input,
                    "stream": True
                }
            ) as response:
                async for chunk in response.aiter_bytes():
                    yield chunk

    return StreamingResponse(
        generate(),
        media_type="text/event-stream"
    )

Create Response

Main API endpoint documentation

Error Handling

Handle errors in streams

API documentation

Responses

Chat Completions (Beta)

Messages (Beta)

Models

Utilities

Features

Reference

Overview

How It Works

Enable Streaming

Event Types

response.created

response.in_progress

response.output_item.added

response.content_part.added

response.output_text.delta

response.output_text.done

response.content_part.done

response.output_item.done

response.completed

response.failed

response.incomplete

error

Reasoning Model Events

Tool Calling Events

Complete Stream Example

Client Disconnect Handling

Error Handling in Streams

Best Practices

Framework Examples

Create Response

Error Handling

API documentation

Responses

Chat Completions (Beta)

Messages (Beta)

Models

Utilities

Features

Reference

Documentation Index

​Overview

​How It Works

​Enable Streaming

​Event Types

​response.created

​response.in_progress

​response.output_item.added

​response.content_part.added

​response.output_text.delta

​response.output_text.done

​response.content_part.done

​response.output_item.done

​response.completed

​response.failed

​response.incomplete

​error

​Reasoning Model Events

​Tool Calling Events

​Complete Stream Example

​Client Disconnect Handling

​Error Handling in Streams

​Best Practices

​Framework Examples

​Related Documentation

Create Response

Error Handling

Overview

How It Works

Enable Streaming

Event Types

response.created

response.in_progress

response.output_item.added

response.content_part.added

response.output_text.delta

response.output_text.done

response.content_part.done

response.output_item.done

response.completed

response.failed

response.incomplete

error

Reasoning Model Events

Tool Calling Events

Complete Stream Example

Client Disconnect Handling

Error Handling in Streams

Best Practices

Framework Examples

Related Documentation