306 lines
11 KiB
Markdown
306 lines
11 KiB
Markdown
# Gadget Code Socket Protocol Reference
|
|
|
|
This document serves as a "Cheat Sheet" for AI agents and developers working on the Gadget Code real-time messaging system.
|
|
|
|
## 1. Components & Connections
|
|
|
|
| Component | Role | Protocol | Auth Method |
|
|
|-----------|------|----------|-------------|
|
|
| `gadget-code:web` | Hub / Router / Server | Socket.IO Server | N/A |
|
|
| `gadget-code:ide` | Frontend Control Surface | Socket.IO Client | JWT Token |
|
|
| `gadget-drone` | Worker / AWL Runner | Socket.IO Client | Drone Registration ID |
|
|
|
|
---
|
|
|
|
## 2. Event Map Overview
|
|
|
|
Defined in `packages/api/src/messages/socket.ts`.
|
|
|
|
### IDE -> Web (Client to Server)
|
|
* `requestSessionLock`: Request to exclusive-lock a drone for a project session.
|
|
* `requestWorkspaceMode`: Request a mode change (Idle, User, Agent).
|
|
* `submitPrompt`: Submit a user prompt for agent processing.
|
|
|
|
### Drone -> Web (Client to Server)
|
|
* `thinking`: Stream reasoning/thought process text.
|
|
* `response`: Stream natural language response text.
|
|
* `toolCall`: Emit a specific tool execution event with result.
|
|
* `workOrderComplete`: Signal that a prompt processing turn is finished.
|
|
* `requestCrashRecovery`: Inbound from drone on restart if it finds a stalled work order.
|
|
* `requestTermination`: Acknowledgment from drone that termination request was received.
|
|
|
|
### Web -> Drone (Server to Client)
|
|
* `processWorkOrder`: Command to start processing a specific prompt/turn.
|
|
* `crashRecoveryResponse`: Command to `discard` or `retry` a stalled work order.
|
|
* `requestTermination`: Command to immediately terminate the drone process.
|
|
|
|
### Web -> IDE (Server to Client)
|
|
* `sessionUpdated`: Notify the IDE that a chat session property has changed (e.g. auto-generated name).
|
|
|
|
---
|
|
|
|
## 3. Core Sequences & Routing
|
|
|
|
### 3.1 Prompt Submission Flow
|
|
1. **IDE** emits `submitPrompt(content)`.
|
|
2. **Web (`CodeSession.ts`)**:
|
|
* Creates a `ChatTurn` document (status: `processing`).
|
|
* Increments the chat session's `stats.turnCount`.
|
|
* Finds the target `DroneSession`.
|
|
* Caches the updated session and signals the **IDE** to enter Processing state.
|
|
* Emits `processWorkOrder` to the **Drone**.
|
|
* On first prompt (name is still the default), calls AI API to auto-generate session name.
|
|
* Emits `sessionUpdated({ name })` to **IDE** if the name changed.
|
|
3. **Drone (`gadget-drone.ts`)**:
|
|
* Writes a local `.gadget/work-order.json` cache (for crash recovery).
|
|
* Calls `AgentService.process()`.
|
|
* Emits streaming events back to **Web**.
|
|
|
|
### 3.2 Result Streaming Flow
|
|
1. **Drone** emits `thinking(text)`, `response(text)`, or `toolCall(id, name, args, result)`.
|
|
2. **Web (`DroneSession.ts`)**:
|
|
* Locates the associated `CodeSession` via `SocketService.getCodeSessionByChatSessionId()`.
|
|
* Updates the `ChatTurn` document in MongoDB incrementally.
|
|
* Forwards the event to the **IDE**.
|
|
3. **IDE**: Updates the UI in real-time.
|
|
|
|
### 3.3 Session Termination
|
|
1. **Drone** emits `workOrderComplete(turnId, success, message)`.
|
|
2. **Web (`DroneSession.ts`)**:
|
|
* Sets `ChatTurn` status to `finished` or `error`.
|
|
* Forwards event to **IDE**.
|
|
* Clears `currentTurnId` from the drone session.
|
|
|
|
### 3.4 Drone Termination Flow
|
|
1. **User** clicks "Terminate" button in Drone Manager UI.
|
|
2. **IDE** calls `POST /api/v1/drone/registration/:id/terminate`.
|
|
3. **Web (`DroneService.ts`)**:
|
|
* Checks if drone is already offline → returns error if so.
|
|
* Looks up `DroneSession` via `SocketService.getDroneSession()`.
|
|
* If drone not connected → marks as offline immediately, returns success.
|
|
* Emits `requestTermination` to drone socket with callback.
|
|
* Starts 10-second timeout.
|
|
4. **Web (`DroneSession.ts`)**:
|
|
* Receives `requestTermination` event.
|
|
* Logs the termination request.
|
|
* Forwards `requestTermination` to drone socket (passthrough).
|
|
5. **Drone (`gadget-drone.ts`)**:
|
|
* Receives `requestTermination` from platform.
|
|
* Calls callback with `success: true`.
|
|
* Sends `SIGINT` to self, triggering graceful shutdown.
|
|
* Updates status to `Offline` during shutdown.
|
|
6. **Web (`DroneService.ts`)**:
|
|
* Drone accepts termination → polls DB every 500ms waiting for `Offline` status.
|
|
* Drone goes offline → resolves with success.
|
|
* Timeout expires (10s) → forces status to `Offline`, resolves with success.
|
|
|
|
---
|
|
|
|
## 4. Message Signatures (TS Reference)
|
|
|
|
### IDE -> Web
|
|
```typescript
|
|
type RequestSessionLockMessage = (
|
|
registration: IDroneRegistration,
|
|
project: IProject,
|
|
chatSession: IChatSession,
|
|
cb: (success: boolean, chatSessionId: string) => void
|
|
) => void;
|
|
|
|
type SubmitPromptMessage = (prompt: string) => void;
|
|
```
|
|
|
|
### Web -> Drone
|
|
```typescript
|
|
type ProcessWorkOrderMessage = (
|
|
registration: IDroneRegistration,
|
|
project: IProject,
|
|
chatSession: IChatSession,
|
|
turn: IChatTurn,
|
|
cb: (success: boolean, message?: string) => void
|
|
) => void;
|
|
type RequestTerminationMessage = (
|
|
cb: (success: boolean) => void
|
|
) => void;
|
|
```
|
|
|
|
### Web -> IDE
|
|
```typescript
|
|
type SessionUpdatedMessage = (
|
|
updates: Partial<IChatSession>
|
|
) => void;
|
|
```
|
|
|
|
### Drone -> Web (Streaming)
|
|
```typescript
|
|
type ThinkingMessage = (content: string) => void;
|
|
type ResponseMessage = (content: string) => void;
|
|
type ToolCallMessage = (
|
|
callId: string,
|
|
name: string,
|
|
params: string, // JSON.stringify
|
|
response: string // JSON.stringify
|
|
) => void;
|
|
type WorkOrderCompleteMessage = (
|
|
workOrderId: string,
|
|
success: boolean,
|
|
message?: string
|
|
) => void;
|
|
type RequestTerminationMessage = (
|
|
cb: (success: boolean) => void
|
|
) => void;
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Session Implementation Guide (Web Server)
|
|
|
|
The web server (`gadget-code:web`) implements two wrapper classes in `src/lib/`:
|
|
|
|
### `CodeSession.ts`
|
|
Manages the IDE socket.
|
|
* **Logic**: Maps User ID -> Socket ID.
|
|
* **Routing**: When an IDE sends a message, `CodeSession` finds the selected drone's `DroneSession` and forwards the command.
|
|
|
|
### `DroneSession.ts`
|
|
Manages the Drone socket.
|
|
* **Logic**: Maps Drone Registration ID -> Socket ID.
|
|
* **Routing**: When a drone streams, `DroneSession` looks up the `chatSessionId` in the `SocketService` index to find the return path to the IDE.
|
|
* **Session Lookup**: `SocketService` maintains a `droneRegistrationIndex` Map that maps `registration._id` → `DroneSession` for efficient lookup by registration ID.
|
|
|
|
### Session Indexing Architecture
|
|
|
|
The `SocketService` maintains multiple indexes for efficient session lookup:
|
|
|
|
1. **`droneSessions`**: Map<socket.id, DroneSession> - Primary storage by socket ID
|
|
2. **`droneRegistrationIndex`**: Map<registration._id, DroneSession> - Lookup by drone registration
|
|
3. **`codeSessions`**: Map<socket.id, CodeSession> - Primary storage by socket ID
|
|
4. **`codeSessionUserIndex`**: Map<user._id, CodeSession> - Lookup by user ID
|
|
5. **`chatSessionIndex`**: Map<chatSessionId, CodeSession> - Reverse lookup from chat session to IDE
|
|
|
|
All indexes are kept in sync during connection and disconnection.
|
|
|
|
---
|
|
|
|
## 6. Workspace Crash Recovery
|
|
|
|
1. **Drone** starts -> checks for `.gadget/work-order.json`.
|
|
2. If found, emits `requestCrashRecovery({ workspaceId, turnId, chatSessionId })`.
|
|
3. **Web (`DroneSession.ts`)**:
|
|
* Checks DB for `ChatTurn` status.
|
|
* If turn is already `finished`, responds with `{ action: "discard" }`.
|
|
* If turn is `processing`, responds with `{ action: "retry" }` and schedules a new `processWorkOrder` after a delay.
|
|
4. **Drone**: Deletes local cache upon receiving any `crashRecoveryResponse`.
|
|
|
|
---
|
|
|
|
## 7. Extending the Protocol
|
|
|
|
To add a new message:
|
|
1. Add the message type to `packages/api/src/messages/ide.ts`, `drone.ts`, or `web.ts`.
|
|
2. Register it in `ClientToServerEvents` or `ServerToClientEvents` in `packages/api/src/messages/socket.ts`.
|
|
3. Re-export from `packages/api/src/index.ts`.
|
|
4. Implement the sender (emit) in the Client (`ide` or `drone`) or Server (`CodeSession`/`DroneSession`).
|
|
5. Implement the handler in the corresponding class or frontend component.
|
|
6. Implement the forward-path routing if needed.
|
|
|
|
---
|
|
|
|
## 8. Reconnection & Message Queuing
|
|
|
|
### 8.1 Problem Statement
|
|
|
|
When the browser refreshes during work order processing:
|
|
1. Old `CodeSession` disconnects, but `DroneSession` continues routing to it
|
|
2. Drone emits events but they go to a disconnected socket
|
|
3. New `CodeSession` connects but isn't linked to the active chat session
|
|
4. Messages are lost; IDE never receives streaming updates
|
|
|
|
### 8.2 Solution Architecture
|
|
|
|
**Three-phase approach:**
|
|
|
|
1. **Redis Message Queue** (`src/lib/message-queue.ts`)
|
|
- Messages enqueued when routing fails (disconnected socket)
|
|
- FIFO ordering with RPUSH/LPOP
|
|
- 30-minute TTL (1800 seconds)
|
|
- Max 1000 messages (drop oldest)
|
|
- Aggregates adjacent thinking/response messages during drain
|
|
|
|
2. **Redis Tab Lock** (`src/lib/tab-lock.ts`)
|
|
- Prevents concurrent tab access to same chat session
|
|
- 1-minute timeout (requires heartbeat renewal)
|
|
- Includes socket ID and user ID for validation
|
|
- Auto-cleanup of stale locks
|
|
|
|
3. **Auto-Reconnection** (`CodeSession.checkAndReestablishActiveSession()`)
|
|
- On connect, checks for active processing turn in DB
|
|
- If found, attempts to acquire tab lock
|
|
- On success, re-establishes chat session index
|
|
- Drains queued messages from Redis
|
|
- Aggregates and delivers messages to client
|
|
|
|
### 8.3 Message Queue Flow
|
|
|
|
```
|
|
Drone emits thinking() → DroneSession.onThinking()
|
|
↓
|
|
SocketService.getCodeSessionByChatSessionId() throws (disconnected)
|
|
↓
|
|
MessageQueue.enqueue(chatSessionId, { type: 'thinking', args: [...] })
|
|
↓
|
|
[30 minutes later] Queue expires automatically
|
|
OR
|
|
[On reconnect] MessageQueue.drain() → aggregateMessages() → deliver
|
|
```
|
|
|
|
### 8.4 Tab Lock Flow
|
|
|
|
```
|
|
IDE connects → CodeSession.register()
|
|
↓
|
|
checkAndReestablishActiveSession()
|
|
↓
|
|
Find active chat session with processing turn
|
|
↓
|
|
TabLock.acquire(chatSessionId, userId, socketId)
|
|
↓
|
|
Success: Register chat session, drain queue, emit status
|
|
OR
|
|
Failure: Emit 'tabLockDenied' → IDE navigates away
|
|
```
|
|
|
|
### 8.5 Frontend Reconciliation
|
|
|
|
The frontend handles reconnection gracefully:
|
|
|
|
1. **Load history first** - Fetch chat session and turns from DB
|
|
2. **Connect socket** - Establish WebSocket connection
|
|
3. **Backend auto-reconnects** - If processing turn found, backend re-establishes
|
|
4. **Receive queued messages** - Aggregated messages delivered in order
|
|
5. **Handle duplicates** - Frontend merges with existing history
|
|
|
|
### 8.6 Single Tab Enforcement
|
|
|
|
Only one tab can control a chat session at a time:
|
|
|
|
- First tab acquires Redis lock
|
|
- Subsequent tabs receive `tabLockDenied` event
|
|
- UI shows "Chat session open in another browser tab"
|
|
- User must navigate away or close the duplicate tab
|
|
|
|
### 8.7 Status Indicators
|
|
|
|
The status bar shows connection state:
|
|
|
|
- **Connected** (green ●) - Socket connected, receiving messages
|
|
- **Connecting** (yellow ●) - Attempting to connect
|
|
- **Error** (red ●) - Connection failed
|
|
- **Disconnected** (gray ●) - No active connection
|
|
|
|
Status messages inform the user:
|
|
- "Connecting..." - Initial connection
|
|
- "Reconnecting to active session..." - Auto-reconnect in progress
|
|
- "Reconnected" - Successfully reconnected
|
|
- "Chat session is open in another browser tab" - Tab lock denied
|