17 KiB
Gadget Code Streaming Responses
Status: ✅ IMPLEMENTED — Full end-to-end streaming responses operational
Last Updated: May 7, 2026
Quick Reference
Streaming Path: AI Provider → @gadget/ai → gadget-drone → gadget-code (backend) → Frontend IDE
Key Concepts:
- IAiStreamChunk: Unified chunk type with
type: 'thinking'|'response'|'toolCall' - Backend Aggregation: Tokens buffered in
DroneSession, persisted at mode changes - Frontend In-Place Updates: Blocks updated by index, not appended (prevents DOM flooding)
- Blocks Array:
ChatTurn.blocks[]stores ordered thinking/responding/tool blocks
Critical Files:
packages/ai/src/api.ts— Stream chunk interfacegadget-code/src/lib/drone-session.ts— Aggregation logicgadget-code/frontend/src/pages/ChatSessionView.tsx— Frontend state managementgadget-code/frontend/src/components/ChatTurn.tsx— Block rendering
Overview
Gadget Code implements real-time streaming responses from AI providers (OpenAI, Ollama) through the entire system stack. As the AI model generates tokens, they flow immediately from the provider → drone → backend → frontend, where they are displayed to the user with minimal latency.
The system supports three types of streaming content:
- Thinking tokens: Reasoning/output from models with thinking capabilities
- Response tokens: The model's primary response content
- Tool calls: Function/tool invocations requested by the model
Architecture & Data Flow
Complete Streaming Path
┌─────────────────┐
│ AI Provider │ (OpenAI / Ollama SDK)
│ (Streaming) │
└────────┬────────┘
│ IAiStreamChunk
│ (type: 'thinking'|'response'|'toolCall')
▼
┌─────────────────┐
│ @gadget/ai │ (packages/ai/src/openai.ts or ollama.ts)
│ streamCallback │
└────────┬────────┘
│ Calls streamCallback(chunk) for each token
▼
┌─────────────────┐
│ gadget-drone │ (gadget-drone/src/services/agent.ts)
│ AgentService │
└────────┬────────┘
│ Socket.IO events
│ thinking(content)
│ response(content)
│ toolCall(callId, name, params, response)
▼
┌─────────────────┐
│ gadget-code │ (gadget-code/src/lib/drone-session.ts)
│ DroneSession │
└────────┬────────┘
│ Aggregates tokens by mode
│ Persists to MongoDB at mode changes
│ Routes events to CodeSession
▼
┌─────────────────┐
│ gadget-code │ (gadget-code/src/lib/code-session.ts)
│ CodeSession │
└────────┬────────┘
│ Socket.IO events to IDE
▼
┌─────────────────┐
│ Frontend IDE │ (gadget-code/frontend/src/pages/ChatSessionView.tsx)
│ Chat Turn │
└─────────────────┘
Event Flow Details
-
AI Provider → @gadget/ai
- OpenAI: Uses
stream: trueinchat.completions.create(), iterates SSE chunks - Ollama: Uses
stream: trueinclient.chat(), iterates async chunks - Both call
streamCallback(chunk: IAiStreamChunk)for each token
- OpenAI: Uses
-
@gadget/ai → gadget-drone
streamCallbackemits Socket.IO events based on chunk type:switch (chunk.type) { case 'thinking': socket.emit("thinking", chunk.data); break; case 'response': socket.emit("response", chunk.data); break; case 'toolCall': socket.emit("toolCall", callId, name, params, data); break; }
-
gadget-drone → gadget-code:backend
- Events arrive at
DroneSessionvia Socket.IO DroneSessionaggregates tokens in memory (see Aggregation below)- At mode changes or tool calls, flushes to MongoDB
- Routes events to corresponding
CodeSessionviaSocketService.getCodeSessionByChatSessionId()
- Events arrive at
-
gadget-code:backend → Frontend IDE
CodeSessionforwards events to IDE socketChatSessionViewreceives events and updates React stateChatTurncomponent renders blocks with Markdown
IAiStreamChunk Interface
Defined in packages/ai/src/api.ts:
interface IAiStreamChunk {
type: 'thinking' | 'response' | 'toolCall';
data: string;
toolCallId?: string;
toolName?: string;
params?: string;
}
type IAiResponseStreamFn = (chunk: IAiStreamChunk) => Promise<void>;
The type field determines how the chunk is processed and displayed. The data field contains the token content. For tool calls, toolCallId, toolName, and params provide metadata.
Aggregation & Persistence
Where Aggregation Lives
Location: gadget-code/src/lib/drone-session.ts
Class: DroneSession
Private Field: streamingBuffers: Map<string, IStreamingBuffer>
IStreamingBuffer Interface
interface IStreamingBuffer {
currentMode: 'thinking' | 'responding' | null;
thinkingContent: string;
respondingContent: string;
lastBlockCreatedAt?: Date;
}
How Aggregation Works
-
Token Arrival: When
onThinking()oronResponse()is called:- Get or create buffer for the current
turnId - Check if
currentModematches the incoming token type - If mode changed, flush previous buffer to database first
- Append token content to appropriate field (
thinkingContentorrespondingContent)
- Get or create buffer for the current
-
Mode Transition Detection:
thinkingevent while inrespondingmode → flush responding, start thinkingresponseevent while inthinkingmode → flush thinking, start responding- Tool call events → flush current buffer, add tool block immediately
-
Database Persistence:
- Flushes occur at mode transitions (not on every token)
ChatTurn.blocksarray is updated with aggregated content- Tool calls are persisted immediately as separate blocks
-
Work Order Completion:
onWorkOrderComplete()flushes any remaining buffered content- Ensures no tokens are lost if streaming ends abruptly
Example Flow
Token Stream: [think: "Hmm"] [think: " let"] [think: " me"] [resp: "Sure"] [tool: search_google] [resp: " I'll"]
Buffer State:
1. After "Hmm": { mode: 'thinking', thinking: "Hmm" }
2. After " let": { mode: 'thinking', thinking: "Hmm let" }
3. After " me": { mode: 'thinking', thinking: "Hmm let me" }
4. After "Sure": FLUSH thinking → DB, { mode: 'responding', responding: "Sure" }
5. After tool call: FLUSH responding → DB, ADD tool block → DB, { mode: null }
6. After " I'll": { mode: 'responding', responding: " I'll" }
ChatTurn Data Model
IChatTurnBlock Interface
Defined in packages/api/src/interfaces/chat-turn.ts:
interface IChatTurnBlockThinking {
mode: 'thinking';
createdAt: Date;
content: string;
}
interface IChatTurnBlockResponding {
mode: 'responding';
createdAt: Date;
content: string;
}
interface IChatTurnBlockTool {
mode: 'tool';
createdAt: Date;
content: IChatToolCall; // { callId, name, parameters, response }
}
type IChatTurnBlock = IChatTurnBlockThinking | IChatTurnBlockResponding | IChatTurnBlockTool;
IChatTurn Changes
The IChatTurn interface now uses a blocks array instead of flat thinking and response strings:
interface IChatTurn {
// ... other fields ...
blocks: IChatTurnBlock[]; // NEW: ordered blocks of thinking/responding/tool
toolCalls: IChatToolCall[]; // Still maintained for detailed tool call data
// ... other fields ...
}
Removed: thinking?: string and response?: string fields (replaced by blocks)
Mongoose Schema
In gadget-code/src/models/chat-turn.ts:
const ChatTurnBlockSchema = new Schema<IChatTurnBlock>({
mode: {
type: String,
enum: ['thinking', 'responding', 'tool'],
required: true
},
createdAt: { type: Date, default: Date.now, required: true },
content: { type: Schema.Types.Mixed, required: true },
}, { _id: false });
ChatTurnSchema = new Schema({
// ...
blocks: { type: [ChatTurnBlockSchema], default: [], required: true },
// ...
});
Frontend Event Handling & Rendering
Event Reception
Location: gadget-code/frontend/src/pages/ChatSessionView.tsx
The ChatSessionView component maintains streaming state:
interface StreamingState {
currentMode: 'thinking' | 'responding' | null;
thinkingContent: string;
respondingContent: string;
currentBlockIndex: number | null; // Tracks which block is being updated
}
Event Handlers
-
handleThinking(content: string):
- If mode changed from responding → thinking, flush responding block
- Aggregate thinking content
- Update current block in place (or create new if mode transition)
-
handleResponse(content: string):
- If mode changed from thinking → responding, flush thinking block
- Aggregate response content
- Update current block in place (or create new if mode transition)
-
handleToolCall(callId, name, params, response):
- Flush any current streaming buffer
- Add tool block immediately (no aggregation for tools)
- Reset streaming state
-
handleWorkOrderComplete():
- Flush any remaining buffered content
- Clean up streaming state for this turn
In-Place Block Updates
The key optimization: blocks are updated in place during streaming, not appended.
// In scheduleUpdate():
if (currentBlockIndex !== null && oldTurn.blocks[currentBlockIndex]) {
// Same mode → update existing block
if (oldBlocks[currentBlockIndex].mode === updateBlock.mode) {
oldBlocks[currentBlockIndex] = updateBlock; // In-place update
newTurn.blocks = oldBlocks;
} else {
// Mode changed → append new block
newTurn.blocks = [...oldTurn.blocks, ...turnUpdates.blocks];
state.currentBlockIndex = oldTurn.blocks.length;
}
}
This prevents the DOM from being flooded with duplicate blocks on each token.
ChatTurn Component Rendering
Location: gadget-code/frontend/src/components/ChatTurn.tsx
The component renders the blocks array:
{turn.blocks.map((block, idx) => {
if (block.mode === 'thinking') {
return (
<div key={idx} className="mb-3">
<div className="text-xs text-text-muted mb-1 font-mono">Thinking</div>
<div
className="p-3 bg-bg-secondary rounded text-sm text-text-muted whitespace-pre-wrap font-mono text-xs"
dangerouslySetInnerHTML={{ __html: marked.parse(block.content) }}
/>
</div>
);
} else if (block.mode === 'responding') {
return (
<div key={idx} className="mb-3">
<div
className="text-text-primary"
dangerouslySetInnerHTML={{ __html: marked.parse(block.content) }}
/>
</div>
);
} else if (block.mode === 'tool') {
const toolCall = block.content;
return (
<div key={idx} className="mb-3">
<div className="flex items-center gap-2 text-xs font-mono text-text-secondary">
<span className="text-brand">●</span>
<span>{toolCall.name}</span>
{toolCall.response && <span className="text-green-500">✓</span>}
</div>
</div>
);
}
})}
Styling
- Thinking blocks: Muted text (
text-text-muted), monospace font, secondary background - Responding blocks: Standard primary text, Markdown rendering
- Tool blocks: One-line summary with ● indicator (brand color), green checkmark if response exists
Markdown Rendering
Uses the marked library with breaks: true to honor line breaks:
import { marked } from "marked";
marked.setOptions({
breaks: true, // Critical: renders \n as <br>
});
Key Implementation Files
| Component | File Path | Responsibility |
|---|---|---|
| AI Interface | packages/ai/src/api.ts |
IAiStreamChunk, IAiResponseStreamFn types |
| OpenAI Provider | packages/ai/src/openai.ts |
Streaming from OpenAI SDK, calls streamCallback |
| Ollama Provider | packages/ai/src/ollama.ts |
Streaming from Ollama SDK, calls streamCallback |
| Drone Agent | gadget-drone/src/services/agent.ts |
Routes stream chunks to Socket.IO events |
| Backend Aggregation | gadget-code/src/lib/drone-session.ts |
Buffers tokens, persists at mode changes |
| Backend Routing | gadget-code/src/lib/code-session.ts |
Forwards events to IDE socket |
| Frontend State | gadget-code/frontend/src/pages/ChatSessionView.tsx |
Manages streaming state, in-place updates |
| Frontend Rendering | gadget-code/frontend/src/components/ChatTurn.tsx |
Renders blocks with Markdown |
| Data Model | packages/api/src/interfaces/chat-turn.ts |
IChatTurnBlock types |
| Mongoose Schema | gadget-code/src/models/chat-turn.ts |
MongoDB persistence schema |
Design Decisions
Why Aggregate in Backend?
- Database Efficiency: Writing to MongoDB on every token would overwhelm the database
- Network Efficiency: Fewer, larger updates instead of thousands of micro-updates
- Mode Awareness: Backend can detect mode transitions and structure data appropriately
Why In-Place Updates in Frontend?
- Performance: Updating existing DOM nodes is cheaper than creating new ones
- Memory: Prevents accumulation of hundreds of nearly-identical block objects
- Correctness: Ensures the UI reflects the actual streaming state (one active block per mode)
Why Blocks Array Instead of Strings?
- Temporal Ordering: Preserves the exact sequence of thinking/responding/tool events
- Reconstruction: Can replay the agent's "thought process" exactly as it happened
- Analytics: Easy to query for patterns (e.g., "how many mode transitions per turn?")
- Query Efficiency: No need for complex MongoDB aggregations to reconstruct turns
Testing & Verification
Manual Testing Steps
- Start backend:
cd gadget-code && pnpm dev:backend - Start frontend:
cd gadget-code/frontend && pnpm dev - Start drone:
cd ~/workspace && pnpm --filter gadget-drone dev - Create chat session, submit prompt
- Observe:
- Thinking content streams in (muted, monospace)
- Response content streams in (standard text)
- Tool calls appear as one-line summaries
- Mode transitions create new blocks
- Final display matches streaming sequence
What to Look For
✅ Correct behavior:
- Content streams in real-time (not all at once at the end)
- Thinking blocks are visually distinct (muted, monospace)
- Tool calls break between thinking/responding blocks
- No duplicate blocks (each mode has one active block during streaming)
- Markdown renders correctly with line breaks
❌ Incorrect behavior:
- Hundreds of blocks with duplicate content (aggregation broken)
- All content appears at once (streaming not working)
- Thinking and response mixed in same block (mode detection broken)
- Tool calls not appearing (tool call events not routed)
Future Enhancements
Potential improvements not yet implemented:
- Token Count Streaming: Emit token counts with each chunk for real-time stats
- Thinking/Response Mode Labels: Optional headers to explicitly label block types
- Block Collapse/Expand: Persist collapsed state for thinking blocks
- Streaming Cursor: Visual indicator (blinking cursor) at end of active streaming block
- Subagent Streaming: Extend streaming to subagent processes
Troubleshooting
No Streaming Updates
Symptoms: UI shows spinner but no content until work order completes
Causes:
streamCallbacknot being called in AI provider- Socket.IO events not emitted from drone
- Event handlers not registered in
DroneSessionorCodeSession
Debug: Check gadget-drone.log for "stream chunk received" entries
Duplicate Blocks
Symptoms: UI shows many blocks with progressively longer content
Causes:
currentBlockIndexnot being tracked in frontendscheduleUpdatealways appending instead of updating in place- Mode transitions not detected properly
Debug: Inspect turn.blocks array in React DevTools
Mode Transitions Not Working
Symptoms: Thinking and response content mixed in same block
Causes:
- Chunk
typefield not set correctly in AI provider - Mode detection logic in event handlers broken
- Buffer flush not occurring on mode change
Debug: Log state.currentMode in event handlers
Related Documentation
- Architecture — Overall system architecture
- Socket Protocol — Socket.IO event definitions
- ChatTurn Interface — TypeScript types
- ChatTurn Model — Mongoose schema