The /api/v2/aipAgents/.../streamingContinue endpoint is sending complete responses as a single chunk instead of streaming progressively as documented. This occurs consistently across different response lengths when using XMLHttpRequest in React Native.
Environment
- Platform: React Native (Expo SDK 54)
- Client Library: Native XMLHttpRequest (no SDK wrapper)
- API Version: v2 (preview=true)
- Agent Version: 12.0
- Endpoint:
POST /api/v2/aipAgents/agents/{agentRid}/sessions/{sessionRid}/streamingContinue?preview=true
Expected Behavior
According to the API documentation:
βReturns a stream of the Agent response text (formatted using markdown) for clients to consume as the response is generated.β
Expected: Multiple progressive chunks arriving as the AI generates the response.
Actual Behavior
The entire response arrives as a single chunk after the complete AI generation is finished, regardless of response length.
Test Results
Test 1: Short Response (216 characters)
Request sent: 2025-11-23T16:26:13.096Z
Chunk 1 received: 16144ms later (216 chars)
Total chunks: 1
Response: "β
Done! Your 'Doc appointment' is scheduled..."
Test 2: Longer Response (981 characters)
Request sent: 2025-11-23T16:33:04.879Z
Chunk 1 received: 4993ms later (981 chars)
Total chunks: 1
Response: "The sky appears blue because of Rayleigh scattering..."
Implementation Details
Request Configuration
// Using native XMLHttpRequest for progressive chunk reception
const xhr = new XMLHttpRequest();
xhr.open('POST', url, true);
xhr.setRequestHeader('Content-Type', 'application/json');
xhr.setRequestHeader('Authorization', `Bearer ${AUTH_TOKEN}`);
xhr.setRequestHeader('Accept', 'text/plain, text/event-stream, */*');
xhr.timeout = 60000;
// Track progress to receive chunks as they arrive
let lastProcessedIndex = 0;
xhr.onprogress = (event) => {
const responseText = xhr.responseText;
// Process only new content since last progress event
if (responseText.length > lastProcessedIndex) {
const newContent = responseText.substring(lastProcessedIndex);
lastProcessedIndex = responseText.length;
console.log(`Chunk received: ${newContent.length} chars`);
onChunk(newContent); // Callback to update UI
}
};
xhr.send(JSON.stringify(requestBody));
Request Body
{
"messageId": "uuid-v4-generated",
"userInput": {
"text": "[User Context]\nUser ID: <anonymized>\nTimezone: America/Chicago\n\n[User Message]\n<user's message>"
}
}
Response Headers (Anonymized)
{
"content-type": "application/json",
"server": "envoy",
"server-timing": "server;dur=1412.117",
"x-envoy-upstream-service-time": "1412"
}
Key Observations
- Single Chunk Delivery: Both short (216 chars) and long (981 chars) responses arrive as exactly 1 chunk
- Complete Buffering: The entire response is buffered server-side before transmission
- Timing Pattern: Response time correlates with content length, suggesting generation completes before sending
- XMLHttpRequest Working:
onprogressfires correctly when data arrives, confirming client-side implementation is correct
Code Extract
Full Streaming Function
export async function streamContinueSession(
agentRid: string,
sessionRid: string,
userMessage: string,
userId: string,
timezone: string,
onChunk: (text: string) => void,
onComplete: () => void,
onError: (error: string) => void
): Promise<void> {
const url = `${API_BASE_URL}/api/v2/aipAgents/agents/${agentRid}/sessions/${sessionRid}/streamingContinue?preview=true`;
const requestBody = {
messageId: generateUUID(),
userInput: {
text: `[User Context]\nUser ID: ${userId}\nTimezone: ${timezone}\n\n[User Message]\n${userMessage}`
}
};
return new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();
let lastProcessedIndex = 0;
let chunkCount = 0;
xhr.onprogress = (event) => {
const responseText = xhr.responseText;
if (responseText.length > lastProcessedIndex) {
const newContent = responseText.substring(lastProcessedIndex);
lastProcessedIndex = responseText.length;
chunkCount++;
console.log(`Chunk ${chunkCount}: ${newContent.length} chars`);
if (newContent.trim()) {
onChunk(newContent);
}
}
};
xhr.onload = () => {
if (xhr.status >= 200 && xhr.status < 300) {
console.log(`Total chunks received: ${chunkCount}`);
onComplete();
resolve();
} else {
onError(`Error: ${xhr.status}`);
reject(new Error(`HTTP ${xhr.status}`));
}
};
xhr.onerror = () => {
onError('Network error');
reject(new Error('Network error'));
};
xhr.open('POST', url, true);
xhr.setRequestHeader('Content-Type', 'application/json');
xhr.setRequestHeader('Authorization', `Bearer ${AUTH_TOKEN}`);
xhr.setRequestHeader('Accept', 'text/plain, text/event-stream, */*');
xhr.timeout = 60000;
xhr.send(JSON.stringify(requestBody));
});
}
Questions
-
Is progressive streaming supported? Does the API actually stream chunks as theyβre generated, or does it buffer the complete response?
-
Response format: Should we expect
text/plain,text/event-stream, or another content type for true streaming? -
Configuration needed? Are there specific request headers, parameters, or agent configurations required to enable progressive streaming?
-
Alternative endpoints? Is there a different endpoint that provides true progressive streaming?
Reproduction Steps
- Create an AIP Agent in Agent Studio
- Create a session using
POST /api/v2/aipAgents/agents/{agentRid}/sessions - Send a message using the code above to
streamingContinueendpoint - Monitor
xhr.onprogressevents - Observe that only 1 chunk is received containing the complete response
Impact
While the current behavior is functional, it prevents us from providing real-time feedback to users as the AI generates responses. Users experience a loading state for the full generation time (5-16 seconds) before seeing any text, rather than seeing progressive text generation.
Request
Could you please clarify:
- Whether progressive streaming is supported in the current API version
- If there are specific configurations or headers needed to enable it
- If this is expected behavior or a potential issue
Thank you for your assistance!
