gadget/docs/streaming-responses.md

# Gadget Code Streaming Responses

**Status:** ✅ **IMPLEMENTED** — Full end-to-end streaming responses operational
**Last Updated:** May 7, 2026

## Quick Reference

**Streaming Path:** AI Provider → @gadget/ai → gadget-drone → gadget-code (backend) → Frontend IDE

**Key Concepts:**
- **IAiStreamChunk**: Unified chunk type with `type: 'thinking'|'response'|'toolCall'`
- **Backend Aggregation**: Tokens buffered in `DroneSession`, persisted at mode changes
- **Frontend In-Place Updates**: Blocks updated by index, not appended (prevents DOM flooding)
- **Blocks Array**: `ChatTurn.blocks[]` stores ordered thinking/responding/tool blocks

**Critical Files:**
- `packages/ai/src/api.ts` — Stream chunk interface
- `gadget-code/src/lib/drone-session.ts` — Aggregation logic
- `gadget-code/frontend/src/pages/ChatSessionView.tsx` — Frontend state management
- `gadget-code/frontend/src/components/ChatTurn.tsx` — Block rendering

## Overview

Gadget Code implements real-time streaming responses from AI providers (OpenAI, Ollama) through the entire system stack. As the AI model generates tokens, they flow immediately from the provider → drone → backend → frontend, where they are displayed to the user with minimal latency.

The system supports three types of streaming content:
- **Thinking tokens**: Reasoning/output from models with thinking capabilities
- **Response tokens**: The model's primary response content
- **Tool calls**: Function/tool invocations requested by the model

## Architecture & Data Flow

### Complete Streaming Path

```
┌─────────────────┐
│  AI Provider    │ (OpenAI / Ollama SDK)
│  (Streaming)    │
└────────┬────────┘
         │ IAiStreamChunk
         │ (type: 'thinking'|'response'|'toolCall')
         ▼
┌─────────────────┐
│  @gadget/ai     │ (packages/ai/src/openai.ts or ollama.ts)
│  streamCallback │
└────────┬────────┘
         │ Calls streamCallback(chunk) for each token
         ▼
┌─────────────────┐
│  gadget-drone   │ (gadget-drone/src/services/agent.ts)
│  AgentService   │
└────────┬────────┘
         │ Socket.IO events
         │ thinking(content)
         │ response(content)
         │ toolCall(callId, name, params, response)
         ▼
┌─────────────────┐
│  gadget-code    │ (gadget-code/src/lib/drone-session.ts)
│  DroneSession   │
└────────┬────────┘
         │ Aggregates tokens by mode
         │ Persists to MongoDB at mode changes
         │ Routes events to CodeSession
         ▼
┌─────────────────┐
│  gadget-code    │ (gadget-code/src/lib/code-session.ts)
│  CodeSession    │
└────────┬────────┘
         │ Socket.IO events to IDE
         ▼
┌─────────────────┐
│  Frontend IDE   │ (gadget-code/frontend/src/pages/ChatSessionView.tsx)
│  Chat Turn      │
└─────────────────┘
```

### Event Flow Details

1. **AI Provider → @gadget/ai**
   - OpenAI: Uses `stream: true` in `chat.completions.create()`, iterates SSE chunks
   - Ollama: Uses `stream: true` in `client.chat()`, iterates async chunks
   - Both call `streamCallback(chunk: IAiStreamChunk)` for each token

2. **@gadget/ai → gadget-drone**
   - `streamCallback` emits Socket.IO events based on chunk type:
     ```typescript
     switch (chunk.type) {
       case 'thinking': socket.emit("thinking", chunk.data); break;
       case 'response': socket.emit("response", chunk.data); break;
       case 'toolCall': socket.emit("toolCall", callId, name, params, data); break;
     }
     ```

3. **gadget-drone → gadget-code:backend**
   - Events arrive at `DroneSession` via Socket.IO
   - `DroneSession` aggregates tokens in memory (see Aggregation below)
   - At mode changes or tool calls, flushes to MongoDB
   - Routes events to corresponding `CodeSession` via `SocketService.getCodeSessionByChatSessionId()`

4. **gadget-code:backend → Frontend IDE**
   - `CodeSession` forwards events to IDE socket
   - `ChatSessionView` receives events and updates React state
   - `ChatTurn` component renders blocks with Markdown

## IAiStreamChunk Interface

Defined in `packages/ai/src/api.ts`:

```typescript
interface IAiStreamChunk {
  type: 'thinking' | 'response' | 'toolCall';
  data: string;
  toolCallId?: string;
  toolName?: string;
  params?: string;
}

type IAiResponseStreamFn = (chunk: IAiStreamChunk) => Promise<void>;
```

The `type` field determines how the chunk is processed and displayed. The `data` field contains the token content. For tool calls, `toolCallId`, `toolName`, and `params` provide metadata.

## Aggregation & Persistence

### Where Aggregation Lives

**Location:** `gadget-code/src/lib/drone-session.ts`
**Class:** `DroneSession`
**Private Field:** `streamingBuffers: Map<string, IStreamingBuffer>`

### IStreamingBuffer Interface

```typescript
interface IStreamingBuffer {
  currentMode: 'thinking' | 'responding' | null;
  thinkingContent: string;
  respondingContent: string;
  lastBlockCreatedAt?: Date;
}
```

### How Aggregation Works

1. **Token Arrival**: When `onThinking()` or `onResponse()` is called:
   - Get or create buffer for the current `turnId`
   - Check if `currentMode` matches the incoming token type
   - If mode changed, flush previous buffer to database first
   - Append token content to appropriate field (`thinkingContent` or `respondingContent`)

2. **Mode Transition Detection**:
   - `thinking` event while in `responding` mode → flush responding, start thinking
   - `response` event while in `thinking` mode → flush thinking, start responding
   - Tool call events → flush current buffer, add tool block immediately

3. **Database Persistence**:
   - Flushes occur at mode transitions (not on every token)
   - `ChatTurn.blocks` array is updated with aggregated content
   - Tool calls are persisted immediately as separate blocks

4. **Work Order Completion**:
   - `onWorkOrderComplete()` flushes any remaining buffered content
   - Ensures no tokens are lost if streaming ends abruptly

### Example Flow

```
Token Stream: [think: "Hmm"] [think: " let"] [think: " me"] [resp: "Sure"] [tool: search_google] [resp: " I'll"]

Buffer State:
1. After "Hmm":     { mode: 'thinking', thinking: "Hmm" }
2. After " let":    { mode: 'thinking', thinking: "Hmm let" }
3. After " me":     { mode: 'thinking', thinking: "Hmm let me" }
4. After "Sure":    FLUSH thinking → DB, { mode: 'responding', responding: "Sure" }
5. After tool call: FLUSH responding → DB, ADD tool block → DB, { mode: null }
6. After " I'll":   { mode: 'responding', responding: " I'll" }
```

## ChatTurn Data Model

### IChatTurnBlock Interface

Defined in `packages/api/src/interfaces/chat-turn.ts`:

```typescript
interface IChatTurnBlockThinking {
  mode: 'thinking';
  createdAt: Date;
  content: string;
}

interface IChatTurnBlockResponding {
  mode: 'responding';
  createdAt: Date;
  content: string;
}

interface IChatTurnBlockTool {
  mode: 'tool';
  createdAt: Date;
  content: IChatToolCall; // { callId, name, parameters, response }
}

type IChatTurnBlock = IChatTurnBlockThinking | IChatTurnBlockResponding | IChatTurnBlockTool;
```

### IChatTurn Changes

The `IChatTurn` interface now uses a `blocks` array instead of flat `thinking` and `response` strings:

```typescript
interface IChatTurn {
  // ... other fields ...
  blocks: IChatTurnBlock[];  // NEW: ordered blocks of thinking/responding/tool
  toolCalls: IChatToolCall[]; // Still maintained for detailed tool call data
  // ... other fields ...
}
```

**Removed:** `thinking?: string` and `response?: string` fields (replaced by blocks)

### Mongoose Schema

In `gadget-code/src/models/chat-turn.ts`:

```typescript
const ChatTurnBlockSchema = new Schema<IChatTurnBlock>({
  mode: {
    type: String,
    enum: ['thinking', 'responding', 'tool'],
    required: true
  },
  createdAt: { type: Date, default: Date.now, required: true },
  content: { type: Schema.Types.Mixed, required: true },
}, { _id: false });

ChatTurnSchema = new Schema({
  // ...
  blocks: { type: [ChatTurnBlockSchema], default: [], required: true },
  // ...
});
```

## Frontend Event Handling & Rendering

### Event Reception

**Location:** `gadget-code/frontend/src/pages/ChatSessionView.tsx`

The `ChatSessionView` component maintains streaming state:

```typescript
interface StreamingState {
  currentMode: 'thinking' | 'responding' | null;
  thinkingContent: string;
  respondingContent: string;
  currentBlockIndex: number | null;  // Tracks which block is being updated
}
```

### Event Handlers

1. **handleThinking(content: string)**:
   - If mode changed from responding → thinking, flush responding block
   - Aggregate thinking content
   - Update current block in place (or create new if mode transition)

2. **handleResponse(content: string)**:
   - If mode changed from thinking → responding, flush thinking block
   - Aggregate response content
   - Update current block in place (or create new if mode transition)

3. **handleToolCall(callId, name, params, response)**:
   - Flush any current streaming buffer
   - Add tool block immediately (no aggregation for tools)
   - Reset streaming state

4. **handleWorkOrderComplete()**:
   - Flush any remaining buffered content
   - Clean up streaming state for this turn

### In-Place Block Updates

The key optimization: **blocks are updated in place during streaming, not appended**.

```typescript
// In scheduleUpdate():
if (currentBlockIndex !== null && oldTurn.blocks[currentBlockIndex]) {
  // Same mode → update existing block
  if (oldBlocks[currentBlockIndex].mode === updateBlock.mode) {
    oldBlocks[currentBlockIndex] = updateBlock;  // In-place update
    newTurn.blocks = oldBlocks;
  } else {
    // Mode changed → append new block
    newTurn.blocks = [...oldTurn.blocks, ...turnUpdates.blocks];
    state.currentBlockIndex = oldTurn.blocks.length;
  }
}
```

This prevents the DOM from being flooded with duplicate blocks on each token.

### ChatTurn Component Rendering

**Location:** `gadget-code/frontend/src/components/ChatTurn.tsx`

The component renders the `blocks` array:

```typescript
{turn.blocks.map((block, idx) => {
  if (block.mode === 'thinking') {
    return (
      <div key={idx} className="mb-3">
        <div className="text-xs text-text-muted mb-1 font-mono">Thinking</div>
        <div
          className="p-3 bg-bg-secondary rounded text-sm text-text-muted whitespace-pre-wrap font-mono text-xs"
          dangerouslySetInnerHTML={{ __html: marked.parse(block.content) }}
        />
      </div>
    );
  } else if (block.mode === 'responding') {
    return (
      <div key={idx} className="mb-3">
        <div
          className="text-text-primary"
          dangerouslySetInnerHTML={{ __html: marked.parse(block.content) }}
        />
      </div>
    );
  } else if (block.mode === 'tool') {
    const toolCall = block.content;
    return (
      <div key={idx} className="mb-3">
        <div className="flex items-center gap-2 text-xs font-mono text-text-secondary">
          <span className="text-brand">●</span>
          <span>{toolCall.name}</span>
          {toolCall.response && <span className="text-green-500">✓</span>}
        </div>
      </div>
    );
  }
})}
```

### Styling

- **Thinking blocks**: Muted text (`text-text-muted`), monospace font, secondary background
- **Responding blocks**: Standard primary text, Markdown rendering
- **Tool blocks**: One-line summary with ● indicator (brand color), green checkmark if response exists

### Markdown Rendering

Uses the `marked` library with `breaks: true` to honor line breaks:

```typescript
import { marked } from "marked";

marked.setOptions({
  breaks: true,  // Critical: renders \n as <br>
});
```

## Key Implementation Files

| Component | File Path | Responsibility |
|-----------|-----------|----------------|
| **AI Interface** | `packages/ai/src/api.ts` | `IAiStreamChunk`, `IAiResponseStreamFn` types |
| **OpenAI Provider** | `packages/ai/src/openai.ts` | Streaming from OpenAI SDK, calls `streamCallback` |
| **Ollama Provider** | `packages/ai/src/ollama.ts` | Streaming from Ollama SDK, calls `streamCallback` |
| **Drone Agent** | `gadget-drone/src/services/agent.ts` | Routes stream chunks to Socket.IO events |
| **Backend Aggregation** | `gadget-code/src/lib/drone-session.ts` | Buffers tokens, persists at mode changes |
| **Backend Routing** | `gadget-code/src/lib/code-session.ts` | Forwards events to IDE socket |
| **Frontend State** | `gadget-code/frontend/src/pages/ChatSessionView.tsx` | Manages streaming state, in-place updates |
| **Frontend Rendering** | `gadget-code/frontend/src/components/ChatTurn.tsx` | Renders blocks with Markdown |
| **Data Model** | `packages/api/src/interfaces/chat-turn.ts` | `IChatTurnBlock` types |
| **Mongoose Schema** | `gadget-code/src/models/chat-turn.ts` | MongoDB persistence schema |

## Design Decisions

### Why Aggregate in Backend?

1. **Database Efficiency**: Writing to MongoDB on every token would overwhelm the database
2. **Network Efficiency**: Fewer, larger updates instead of thousands of micro-updates
3. **Mode Awareness**: Backend can detect mode transitions and structure data appropriately

### Why In-Place Updates in Frontend?

1. **Performance**: Updating existing DOM nodes is cheaper than creating new ones
2. **Memory**: Prevents accumulation of hundreds of nearly-identical block objects
3. **Correctness**: Ensures the UI reflects the actual streaming state (one active block per mode)

### Why Blocks Array Instead of Strings?

1. **Temporal Ordering**: Preserves the exact sequence of thinking/responding/tool events
2. **Reconstruction**: Can replay the agent's "thought process" exactly as it happened
3. **Analytics**: Easy to query for patterns (e.g., "how many mode transitions per turn?")
4. **Query Efficiency**: No need for complex MongoDB aggregations to reconstruct turns

## Testing & Verification

### Manual Testing Steps

1. Start backend: `cd gadget-code && pnpm dev:backend`
2. Start frontend: `cd gadget-code/frontend && pnpm dev`
3. Start drone: `cd ~/workspace && pnpm --filter gadget-drone dev`
4. Create chat session, submit prompt
5. Observe:
   - Thinking content streams in (muted, monospace)
   - Response content streams in (standard text)
   - Tool calls appear as one-line summaries
   - Mode transitions create new blocks
   - Final display matches streaming sequence

### What to Look For

✅ **Correct behavior:**
- Content streams in real-time (not all at once at the end)
- Thinking blocks are visually distinct (muted, monospace)
- Tool calls break between thinking/responding blocks
- No duplicate blocks (each mode has one active block during streaming)
- Markdown renders correctly with line breaks

❌ **Incorrect behavior:**
- Hundreds of blocks with duplicate content (aggregation broken)
- All content appears at once (streaming not working)
- Thinking and response mixed in same block (mode detection broken)
- Tool calls not appearing (tool call events not routed)

## Reasoning Effort

The reasoning effort setting controls how much a model thinks before responding. See [Reasoning Effort](./reasoning-effort.md) for full documentation.

Key integration points with streaming:
- When reasoning effort is **Off** (`false`), no thinking tokens are produced
- When set to **Low/Medium/High**, the model allocates corresponding thinking depth
- Thinking tokens stream through the same path as response tokens but with `type: 'thinking'`

## Future Enhancements

Potential improvements not yet implemented:

1. **Token Count Streaming**: Emit token counts with each chunk for real-time stats
2. **Thinking/Response Mode Labels**: Optional headers to explicitly label block types
3. **Block Collapse/Expand**: Persist collapsed state for thinking blocks
4. **Streaming Cursor**: Visual indicator (blinking cursor) at end of active streaming block
5. **Subagent Streaming**: Extend streaming to subagent processes

## Troubleshooting

### No Streaming Updates

**Symptoms:** UI shows spinner but no content until work order completes

**Causes:**
1. `streamCallback` not being called in AI provider
2. Socket.IO events not emitted from drone
3. Event handlers not registered in `DroneSession` or `CodeSession`

**Debug:** Check `gadget-drone.log` for "stream chunk received" entries

### Duplicate Blocks

**Symptoms:** UI shows many blocks with progressively longer content

**Causes:**
1. `currentBlockIndex` not being tracked in frontend
2. `scheduleUpdate` always appending instead of updating in place
3. Mode transitions not detected properly

**Debug:** Inspect `turn.blocks` array in React DevTools

### Mode Transitions Not Working

**Symptoms:** Thinking and response content mixed in same block

**Causes:**
1. Chunk `type` field not set correctly in AI provider
2. Mode detection logic in event handlers broken
3. Buffer flush not occurring on mode change

**Debug:** Log `state.currentMode` in event handlers

## Related Documentation

- [Architecture](./architecture.md) — Overall system architecture
- [Socket Protocol](./socket-protocol.md) — Socket.IO event definitions
- [ChatTurn Interface](../packages/api/src/interfaces/chat-turn.ts) — TypeScript types
- [ChatTurn Model](../gadget-code/src/models/chat-turn.ts) — Mongoose schema
- [Reasoning Effort](./reasoning-effort.md) — How thinking/reasoning is controlled