Documentation Index
Fetch the complete documentation index at: https://concentrate.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Streaming enables real-time response generation using Server-Sent Events (SSE). Instead of waiting for the complete response, you receive content incrementally as it’s generated, providing a better user experience.How It Works
When you setstream: true, the API:
- Returns an SSE stream instead of JSON
- Sends events incrementally as content is generated
- Each event contains a JSON payload with type and data
- Closes the connection when complete
Enable Streaming
Add"stream": true to your request:
Event Types
The stream emits different event types as content is generated:response.created
Sent at the beginning of the response stream.response.in_progress
Sent periodically during response generation to indicate progress.response.output_item.added
Signals the start of a new output item (message or reasoning block).response.content_part.added
Signals the start of a new content part within an output item.response.output_text.delta
Contains incremental text content. This is where the actual text appears.response.output_text.done
Sent when a text content part is complete.response.content_part.done
Signals the completion of a content part.response.output_item.done
Signals the completion of an output item.response.completed
Sent when the entire response is complete, includes final usage information.response.failed
Sent when the response generation fails due to an error.response.incomplete
Sent when the response generation stops before completion (e.g., due to token limits or content filters).error
Sent when an error occurs during streaming.Reasoning Model Events
For models with reasoning capabilities (e.g., o1, command-a-reasoning), additional event types are emitted: response.reasoning_summary_part.added - Signals the start of a new reasoning summary part response.reasoning_summary_part.done - Signals the completion of a reasoning summary part response.reasoning_summary_text.delta - Incremental reasoning summary text response.reasoning_summary_text.done - Completed reasoning summary response.reasoning_text.delta - Incremental detailed reasoning text response.reasoning_text.done - Completed detailed reasoning Example reasoning summary part event:Tool Calling Events
When the model calls a tool, you’ll receive these events:response.function_call_arguments.delta- Incremental tool arguments (streaming)response.function_call_arguments.done- Complete tool call with final arguments
Complete Stream Example
Here’s a full example showing all events in order:Client Disconnect Handling
The API automatically detects when clients disconnect and aborts the request:- No additional charges for tokens after disconnect
- Resources are immediately freed
- Generation stops as soon as disconnect is detected
Client disconnects are handled gracefully. You will only be charged for tokens generated before the disconnect.
Error Handling in Streams
Errors during streaming are sent as special error events:provider_error: The upstream provider failedrate_limit_error: Rate limit exceededinsufficient_credits: Not enough credits to complete
Best Practices
Buffer partial chunks
Buffer partial chunks
SSE events may arrive in partial chunks. Buffer incomplete JSON before parsing:
Handle reconnection
Handle reconnection
Implement exponential backoff for reconnection on network failures:
Display content incrementally
Display content incrementally
For the best user experience, render content as it arrives:
Set timeouts
Set timeouts
Configure appropriate timeouts for long-running streams:
Monitor token usage
Monitor token usage
Track tokens in real-time to implement custom limits:
Framework Examples
React
React
Express.js (Server-Side)
Express.js (Server-Side)
FastAPI (Python)
FastAPI (Python)
Related Documentation
Create Response
Main API endpoint documentation
Error Handling
Handle errors in streams