From b481183c99a10112e08bc0e18b028035a0420672 Mon Sep 17 00:00:00 2001
From: Rob Colbert <rob@digitaltelepresence.com>
Date: Thu, 7 May 2026 12:30:58 -0400
Subject: [PATCH] doc updates for straming response task

---
 docs/architecture.md        |   1 -
 docs/streaming-responses.md | 117 ++++++++++++++++++++++++++++++++++++
 2 files changed, 117 insertions(+), 1 deletion(-)
 create mode 100644 docs/streaming-responses.md

diff --git a/docs/architecture.md b/docs/architecture.md
index 6fc2788..43735ce 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -726,5 +726,4 @@ gadget/
 ---
 
 **Last Updated:** April 29, 2026  
-**Branch:** `feature/socket-protocol`  
 **Status:** Production-ready foundation with complete Chat Session UI
diff --git a/docs/streaming-responses.md b/docs/streaming-responses.md
new file mode 100644
index 0000000..357a17d
--- /dev/null
+++ b/docs/streaming-responses.md
@@ -0,0 +1,117 @@
+# Gadget Code Streaming Responses
+
+The [Architecture](./architecture.md) and [Socket Protocol](./socket-protocol.md) documents describe the Gadget Code system, and the socket protocol we use for transmitting events, respectively.
+
+The User selects a Project and Drone from a list of available Projects and Drones. The User then starts a chat session with the Drone to work on the Project. When the user submits a prompt:
+
+1. gadget-code:frontend generates the `submitPrompt` event and sends it to [gadget-code:backend](../gadget-code/src/web-app.ts).
+2. gadget-code:backend wraps the prompt as a Work Order, selects the [gadget-drone](../gadget-drone/src/gadget-drone.ts) process locked to the chat session (CodeSession), and forwards the Work Order to the drone for processing.
+3. The drone executes the Agentic Workflow Loop, calling the specified AI SDK's `chat` endpoint to process a prompt until done.
+
+As long as the response contains tool calls, the loop continues inserting the tool call response output into the context and calling the AI ASK again for additional processing.
+
+When there are no further tool calls received, the loop ends and communicates the final results of the turn back to gadget-code:backend, which then forwards the information back to the gadget-code:frontend (IDE) that sent the command.
+
+The system currently performs all work, receives the response, and sends back a response at the conclusion of the work.
+
+## Streaming Responses
+
+The OpenAI and Ollama SDKs both support streaming responses. We want to convert/refactor our Agentic Workflow Loop's response - to _streaming responses_.
+
+- As thinking/reasoning tokens are emitted, we want gadget-drone to emit the `thinking` event to the backend on it's DroneSession so the client IDE can update as the model is processing the work.
+- As response tokens are emitted from the streaming response, we want to emit our own `response` event so the IDE can update as the model is processing the work.
+- As tool calls are being requested by the agent, we want to emit our `toolCall` event so the IDE can update with information about the tool call in real-time as the model is processing the work.
+
+### Streaming: OpenAI
+
+We will need to add streaming response processing to [test](../packages/ai/src/openai.ts). The `generate` and `chat` methods currently accept the `streamCallback` parameter, which is a callback that gets called with each token and tool call as it's emitted from the SDK.
+
+We need to verify that the full path for these events is implemented, from gadget-drone generating the event to gadget-code:frontend (IDE) receiving the event, with gadget-code:backend (web server) processing things correctly to get the message routed back to the IDE CodeSession that wants it based on the work order's information for chat session, etc.
+
+### Streaming: Ollama
+
+We will need to add streaming response processing to [test](../packages/ai/src/ollama.ts). The `generate` and `chat` methods currently accept the `streamCallback` parameter, which is a callback that gets called with each token and tool call as it's emitted from the SDK.
+
+We need to verify that the full path for these events is implemented, from gadget-drone generating the event to gadget-code:frontend (IDE) receiving the event, with gadget-code:backend (web server) processing things correctly to get the message routed back to the IDE CodeSession that wants it based on the work order's information for chat session, etc.
+
+## IDE Processing
+
+When the User submits a prompt for processing as a work order, the IDE creates a ChatTurn component instance in the Chat View, and uses the Turn component to display these streaming response updates as they arrive.
+
+The Turn component will display a spinner in it's header area while processing the work order/prompt.
+
+Within the Agent's portion of the Turn display, it should look like the agent is having it's own conversation. Updates should occur in modes or phases that change based on the _most recent_ content received. The agent's response, while processing a prompt, has three modes:
+
+- **Idle**: The agent's content display begins as Idle when the Turn is created. This allows it to 'sense' a transition to either `thinking` or `responding`.
+- **Thinking**: When transitioning to the `thinking` mode, a new thinking div is added to receive the token(s). We continue streaming thinking tokens into the thinking div for as long as the tokens are still thinking tokens. When a response-mode token is received, we transition TO response mode, insert a new response div, and proceed.
+- **Responding**: When transitioning to the `responding` mode, a new response div is added to receive the token(s). We continue streaming further response-mode tokens into this div for as long as the tokens are still response-mode tokens. If a thinking token is received, we transition TO the thinking mode, insert a new thinking div, and proceed.
+
+If the content received is in the same mode as the current agent response mode (thinking/responding), then this is an append operation. As content arrives, we append the content to the current thinking/responding content, and update the display of it. We do this as efficiently as possible, trying to avoid copies as much as possible, as these messages can arrive at a rather high rate of speed.
+
+### IDE Tool Call Displays
+
+When a `toolCall` message is received, break out of the current thinking/responding block to append a tool call div/display in the agent's response. We will use this to provide SUMMARY information about the tool call:
+
+- An ● indicator showing success status (green for success, red for error, yellow for warning, etc.)
+- The name of the tool called (e.g., `search_google`)
+
+● search_google
+
+We don't summarize successful or erroneous responses here in the tool call indicator in the chat message flow. We will be adding a separate view to display these messages in full detail later in the SESSION panel of the Chat Session view.
+
+## Persistence
+
+The primary goal of this application is to make the agentic engineering process as **observable** as possible. If there is data available from a work order's Agentic Workflow Loop and processing, we:
+
+1. Normalize that data to our own interfaces as defined by:
+   1. [@gadget/ai](../packages/ai/) and [@gadget/api](../packages/api/);
+   1. [ChatTurn](../packages/api/src/interfaces/chat-turn.ts) interface; and
+   1. [ChatTurn](../gadget-code/src/models/chat-turn.ts) model.
+2. Store that normalized data into our own database (MongoDB) within the current `ChatTurn` record being managed by gadget-code:backend (web server).
+
+The `gadget-code:backend` (web) server will be responsible for persisting this data to our database. As events arrive FROM `gadget-drone` indicating progress within a Work Order (`thinking`, `response`, `toolCall`), we will update the `ChatTurn` record with the new data. And this is why stateful sessions exist for [DroneSession](../gadget-code/src/lib/drone-session.ts) and [CodeSession](../gadget-code/src/lib/code-session.ts).
+
+While receiving thinking tokens, we should be aggregating that content into the session object in memory instead of calling the database with every micro-update. At mode changes, such as when changing from thinking => responding, or responding => thinking, we perform one update to write that block's aggregated content to the `ChatTurn` as it's type of content (thinking, responding, tool).
+
+We need to adapt the persistence within `ChatTurn` to record this stateful/modal flow. When the User loads an existing `ChatSession` and it's associated turns for display, they should receive the _same display_ as they saw while working.
+
+### Changes to ChatTurn Model
+
+We need to add fields to our [ChatTurn](../packages/api/src/interfaces/chat-turn.ts) interface and [ChatTurn](../gadget-code/src/models/chat-turn.ts) model that will allow us to store the stateful information for display in the UI.
+
+Currently, there are simple string/text fields for `thinking` and `response`. This is insufficient for being able to reconstruct the actual flow that happened while processing the work order.
+
+We will repurpose the `response` field to become an array of objects, each object representing a `thinking`/`responding`/`toolCall` block. The `thinking` and `responding` blocks will be that which stores the aggregated content that was received while processing the work order, in order. The `toolCall` block will be that which stores the information about each tool call that happens, in order.
+
+Each object will have a `mode` (`thinking`, `responding`, `tool`), a `createdAt` timestamp, and a `content` field which is the content of the event for display in the UI. The `content` field can be a string, such as when storing thinking and responding text content, or an object that records tool call information and metadata for analysis later by the User and by analytics tools.
+
+## Chat Turn Component Updates
+
+The [Chat Turn Component](../gadget-code/frontend/src/components/ChatTurn.tsx) will require updates to enhance the Agent's portion of the turn display. The Agent's block will now resemble:
+
+```
+Thinking: [content streams in]
+
+● search_google
+● search_google
+
+Thinking: [content streams in]
+
+Responding: [content streams in]
+
+● search_google
+
+Responding: [content streams in]
+```
+
+We don't label the thinking and responding blocks with a header. Instead, we use style to indicate the type of block presented:
+
+- Thinking is muted
+- Responding is standard formatting
+- Tool calls are just a one-line element that summarizes the call
+
+## Markdown
+
+All text content within a Chat Turn display is in Markdown format. The User uses Markdown, and the Agent responds with Markdown. We will use the [marked]() package to render the Markdown text to HTML for display.
+
+Caveat: The marked library sets it's `breaks` option to `false` by default. This is not the behavior what we want. When rendering Markdown content for display in the Chat Turn component for either the User or the Aget, `breaks` must be set to `true` to honor line breaks and display them appropriately.