# Gadget Code Socket Protocol Reference This document serves as a "Cheat Sheet" for AI agents and developers working on the Gadget Code real-time messaging system. ## 1. Components & Connections | Component | Role | Protocol | Auth Method | |-----------|------|----------|-------------| | `gadget-code:web` | Hub / Router / Server | Socket.IO Server | N/A | | `gadget-code:ide` | Frontend Control Surface | Socket.IO Client | JWT Token | | `gadget-drone` | Worker / AWL Runner | Socket.IO Client | Drone Registration ID | --- ## 2. Event Map Overview Defined in `packages/api/src/messages/socket.ts`. ### IDE -> Web (Client to Server) * `requestSessionLock`: Request to exclusive-lock a drone for a project session. * `requestWorkspaceMode`: Request a mode change (Idle, User, Agent). * `submitPrompt`: Submit a user prompt for agent processing. ### Drone -> Web (Client to Server) * `thinking`: Stream reasoning/thought process text. * `response`: Stream natural language response text. * `toolCall`: Emit a specific tool execution event with result. * `workOrderComplete`: Signal that a prompt processing turn is finished. * `requestCrashRecovery`: Inbound from drone on restart if it finds a stalled work order. * `requestTermination`: Acknowledgment from drone that termination request was received. ### Web -> Drone (Server to Client) * `processWorkOrder`: Command to start processing a specific prompt/turn. * `crashRecoveryResponse`: Command to `discard` or `retry` a stalled work order. * `requestTermination`: Command to immediately terminate the drone process. ### Web -> IDE (Server to Client) * `sessionUpdated`: Notify the IDE that a chat session property has changed (e.g. auto-generated name). --- ## 3. Core Sequences & Routing ### 3.1 Prompt Submission Flow 1. **IDE** emits `submitPrompt(content)`. 2. **Web (`CodeSession.ts`)**: * Creates a `ChatTurn` document (status: `processing`). * Increments the chat session's `stats.turnCount`. * Finds the target `DroneSession`. * Caches the updated session and signals the **IDE** to enter Processing state. * Emits `processWorkOrder` to the **Drone**. * On first prompt (name is still the default), calls AI API to auto-generate session name. * Emits `sessionUpdated({ name })` to **IDE** if the name changed. 3. **Drone (`gadget-drone.ts`)**: * Writes a local `.gadget/work-order.json` cache (for crash recovery). * Calls `AgentService.process()`. * Emits streaming events back to **Web**. ### 3.2 Result Streaming Flow 1. **Drone** emits `thinking(text)`, `response(text)`, or `toolCall(id, name, args, result)`. 2. **Web (`DroneSession.ts`)**: * Locates the associated `CodeSession` via `SocketService.getCodeSessionByChatSessionId()`. * Updates the `ChatTurn` document in MongoDB incrementally. * Forwards the event to the **IDE**. 3. **IDE**: Updates the UI in real-time. ### 3.3 Session Termination 1. **Drone** emits `workOrderComplete(turnId, success, message)`. 2. **Web (`DroneSession.ts`)**: * Sets `ChatTurn` status to `finished` or `error`. * Forwards event to **IDE**. * Clears `currentTurnId` from the drone session. ### 3.4 Drone Termination Flow 1. **User** clicks "Terminate" button in Drone Manager UI. 2. **IDE** calls `POST /api/v1/drone/registration/:id/terminate`. 3. **Web (`DroneService.ts`)**: * Checks if drone is already offline → returns error if so. * Looks up `DroneSession` via `SocketService.getDroneSession()`. * If drone not connected → marks as offline immediately, returns success. * Emits `requestTermination` to drone socket with callback. * Starts 10-second timeout. 4. **Web (`DroneSession.ts`)**: * Receives `requestTermination` event. * Logs the termination request. * Forwards `requestTermination` to drone socket (passthrough). 5. **Drone (`gadget-drone.ts`)**: * Receives `requestTermination` from platform. * Calls callback with `success: true`. * Sends `SIGINT` to self, triggering graceful shutdown. * Updates status to `Offline` during shutdown. 6. **Web (`DroneService.ts`)**: * Drone accepts termination → polls DB every 500ms waiting for `Offline` status. * Drone goes offline → resolves with success. * Timeout expires (10s) → forces status to `Offline`, resolves with success. --- ## 4. Message Signatures (TS Reference) ### IDE -> Web ```typescript type RequestSessionLockMessage = ( registration: IDroneRegistration, project: IProject, chatSession: IChatSession, cb: (success: boolean, chatSessionId: string) => void ) => void; type SubmitPromptMessage = (prompt: string) => void; ``` ### Web -> Drone ```typescript type ProcessWorkOrderMessage = ( registration: IDroneRegistration, project: IProject, chatSession: IChatSession, turn: IChatTurn, cb: (success: boolean, message?: string) => void ) => void; type RequestTerminationMessage = ( cb: (success: boolean) => void ) => void; ``` ### Web -> IDE ```typescript type SessionUpdatedMessage = ( updates: Partial ) => void; ``` ### Drone -> Web (Streaming) ```typescript type ThinkingMessage = (content: string) => void; type ResponseMessage = (content: string) => void; type ToolCallMessage = ( callId: string, name: string, params: string, // JSON.stringify response: string // JSON.stringify ) => void; type WorkOrderCompleteMessage = ( workOrderId: string, success: boolean, message?: string ) => void; type RequestTerminationMessage = ( cb: (success: boolean) => void ) => void; ``` --- ## 5. Session Implementation Guide (Web Server) The web server (`gadget-code:web`) implements two wrapper classes in `src/lib/`: ### `CodeSession.ts` Manages the IDE socket. * **Logic**: Maps User ID -> Socket ID. * **Routing**: When an IDE sends a message, `CodeSession` finds the selected drone's `DroneSession` and forwards the command. ### `DroneSession.ts` Manages the Drone socket. * **Logic**: Maps Drone Registration ID -> Socket ID. * **Routing**: When a drone streams, `DroneSession` looks up the `chatSessionId` in the `SocketService` index to find the return path to the IDE. * **Session Lookup**: `SocketService` maintains a `droneRegistrationIndex` Map that maps `registration._id` → `DroneSession` for efficient lookup by registration ID. ### Session Indexing Architecture The `SocketService` maintains multiple indexes for efficient session lookup: 1. **`droneSessions`**: Map - Primary storage by socket ID 2. **`droneRegistrationIndex`**: Map - Lookup by drone registration 3. **`codeSessions`**: Map - Primary storage by socket ID 4. **`codeSessionUserIndex`**: Map - Lookup by user ID 5. **`chatSessionIndex`**: Map - Reverse lookup from chat session to IDE All indexes are kept in sync during connection and disconnection. --- ## 6. Workspace Crash Recovery 1. **Drone** starts -> checks for `.gadget/work-order.json`. 2. If found, emits `requestCrashRecovery({ workspaceId, turnId, chatSessionId })`. 3. **Web (`DroneSession.ts`)**: * Checks DB for `ChatTurn` status. * If turn is already `finished`, responds with `{ action: "discard" }`. * If turn is `processing`, responds with `{ action: "retry" }` and schedules a new `processWorkOrder` after a delay. 4. **Drone**: Deletes local cache upon receiving any `crashRecoveryResponse`. --- ## 7. Extending the Protocol To add a new message: 1. Add the message type to `packages/api/src/messages/ide.ts`, `drone.ts`, or `web.ts`. 2. Register it in `ClientToServerEvents` or `ServerToClientEvents` in `packages/api/src/messages/socket.ts`. 3. Re-export from `packages/api/src/index.ts`. 4. Implement the sender (emit) in the Client (`ide` or `drone`) or Server (`CodeSession`/`DroneSession`). 5. Implement the handler in the corresponding class or frontend component. 6. Implement the forward-path routing if needed. --- ## 8. Reconnection & Message Queuing ### 8.1 Problem Statement When the browser refreshes during work order processing: 1. Old `CodeSession` disconnects, but `DroneSession` continues routing to it 2. Drone emits events but they go to a disconnected socket 3. New `CodeSession` connects but isn't linked to the active chat session 4. Messages are lost; IDE never receives streaming updates ### 8.2 Solution Architecture **Three-phase approach:** 1. **Redis Message Queue** (`src/lib/message-queue.ts`) - Messages enqueued when routing fails (disconnected socket) - FIFO ordering with RPUSH/LPOP - 30-minute TTL (1800 seconds) - Max 1000 messages (drop oldest) - Aggregates adjacent thinking/response messages during drain 2. **Redis Tab Lock** (`src/lib/tab-lock.ts`) - Prevents concurrent tab access to same chat session - 1-minute timeout (requires heartbeat renewal) - Includes socket ID and user ID for validation - Auto-cleanup of stale locks 3. **Auto-Reconnection** (`CodeSession.checkAndReestablishActiveSession()`) - On connect, checks for active processing turn in DB - If found, attempts to acquire tab lock - On success, re-establishes chat session index - Drains queued messages from Redis - Aggregates and delivers messages to client ### 8.3 Message Queue Flow ``` Drone emits thinking() → DroneSession.onThinking() ↓ SocketService.getCodeSessionByChatSessionId() throws (disconnected) ↓ MessageQueue.enqueue(chatSessionId, { type: 'thinking', args: [...] }) ↓ [30 minutes later] Queue expires automatically OR [On reconnect] MessageQueue.drain() → aggregateMessages() → deliver ``` ### 8.4 Tab Lock Flow ``` IDE connects → CodeSession.register() ↓ checkAndReestablishActiveSession() ↓ Find active chat session with processing turn ↓ TabLock.acquire(chatSessionId, userId, socketId) ↓ Success: Register chat session, drain queue, emit status OR Failure: Emit 'tabLockDenied' → IDE navigates away ``` ### 8.5 Frontend Reconciliation The frontend handles reconnection gracefully: 1. **Load history first** - Fetch chat session and turns from DB 2. **Connect socket** - Establish WebSocket connection 3. **Backend auto-reconnects** - If processing turn found, backend re-establishes 4. **Receive queued messages** - Aggregated messages delivered in order 5. **Handle duplicates** - Frontend merges with existing history ### 8.6 Single Tab Enforcement Only one tab can control a chat session at a time: - First tab acquires Redis lock - Subsequent tabs receive `tabLockDenied` event - UI shows "Chat session open in another browser tab" - User must navigate away or close the duplicate tab ### 8.7 Status Indicators The status bar shows connection state: - **Connected** (green ●) - Socket connected, receiving messages - **Connecting** (yellow ●) - Attempting to connect - **Error** (red ●) - Connection failed - **Disconnected** (gray ●) - No active connection Status messages inform the user: - "Connecting..." - Initial connection - "Reconnecting to active session..." - Auto-reconnect in progress - "Reconnected" - Successfully reconnected - "Chat session is open in another browser tab" - Tab lock denied