# Gadget Code Architecture Review **Date:** April 29, 2026 **Scope:** Socket.IO Communication System for Agentic Workflow Loop ## Executive Summary The Gadget Code architecture is **80% complete** with solid foundations, but has critical gaps preventing end-to-end prompt processing. The Socket.IO infrastructure is properly structured, but message handlers lack implementation, data models have inconsistencies, and the agentic workflow loop cannot yet execute. **Primary Blocker:** A prompt submitted from the IDE cannot reach the drone's AgentService for processing, and results cannot flow back to persist in ChatTurn documents. ### Architectural Decision: Socket.IO Only (No Bull Queue) **Decision:** Bull queue will **not** be used. All message routing uses Socket.IO with directed delivery. **Rationale:** - Better performance for real-time agentic workflows - Eliminates Redis dependency for end users - Simpler deployment model **Recovery Strategy:** Workspace persistence via `.gadget/` directory (see Section 7: Workspace Persistence Architecture). --- ## 1. Architecture Soundness ### ✅ What's Working Well 1. **Socket.IO Server Setup** (`gadget-code/src/services/socket.ts`) - Proper authentication middleware distinguishing Code (IDE) vs Drone sessions - Session management via `CodeSession` and `DroneSession` classes - Clean separation of concerns with session types 2. **Event Interface Definitions** (`packages/api/src/messages/*.ts`) - `ClientToServerEvents` and `ServerToClientEvents` properly typed - Message signatures match between IDE↔Web↔Drone - Callback-based request/response pattern is sound 3. **Data Model Foundation** (`packages/api/src/interfaces/*.ts`) - `IChatTurn`, `IChatSession`, `IChatToolCall` capture AWL state - `WorkspaceMode` enum correctly models mutual exclusion - Socket routing architecture is correct ### ❌ Critical Design Issues #### Issue 1: Duplicate `DroneStatus` Enum **Location:** `packages/api/src/interfaces/drone-registration.ts` vs `gadget-drone/src/services/platform.ts` Both files define `DroneStatus` with identical values. The drone imports from its local copy, but `@gadget/api` exports a different type. This causes type mismatches when passing registrations between packages. **Fix:** Remove `DroneStatus` from `gadget-drone/src/services/platform.ts` and import from `@gadget/api`. #### Issue 2: Conflicting `IAiProvider` Interfaces **Location:** - `packages/api/src/interfaces/ai-provider.ts` defines `IAiProvider extends Document` with `apiType: "ollama" | "openai"` and `models: IAiModel[]` - `packages/ai/src/api.ts` defines `IAiProvider` with `sdk: "ollama" | "openai"` and no Mongoose dependencies **Impact:** `gadget-drone/src/services/agent.ts:74` fails TypeScript compilation: ```typescript const api = this.getApi(provider); // Error: ObjectId | IAiProvider not assignable ``` The `IChatTurn.provider` field is typed as `IAiProvider | Types.ObjectId` (from `@gadget/api`), but `@gadget/ai` expects a different shape. **Fix:** 1. Keep `@gadget/api` as the Mongoose document interface (database layer) 2. Keep `@gadget/ai` as the runtime config interface (AI SDK layer) 3. Add a mapper in `gadget-drone/src/services/ai.ts` that converts `IAiProvider | ObjectId` → `IAiProvider` before calling `createAiApi()` #### Issue 3: Missing `callId` in Tool Call Message **Location:** `packages/api/src/messages/drone.ts:27-30` ```typescript export type ToolCallMessage = ( name: string, params: string, response: string, ) => void; ``` But `IChatToolCall` in `packages/api/src/interfaces/chat-turn.ts` requires `callId: string`. The socket message doesn't match the persistence model. **Fix:** Add `callId: string` as first parameter to `ToolCallMessage`. --- ## 2. Completeness Analysis ### 2.1 Socket.IO Message Flow | Message | IDE→Web | Web→Drone | Drone→Web | Web→IDE | Status | |---------|---------|-----------|-----------|---------|--------| | `requestSessionLock` | ✅ Sent | ✅ Routed | ✅ Received | ❌ Not implemented | Partial | | `requestWorkspaceMode` | ✅ Sent | ✅ Routed | ✅ Received | ❌ Not implemented | Partial | | `submitPrompt` | ✅ Sent | ❌ Not handled | ❌ Not sent | N/A | **Broken** | | `processWorkOrder` | N/A | ✅ Sent | ✅ Received | N/A | ✅ Complete | | `thinking` | N/A | ❌ Not routed | ✅ Sent | ❌ Not emitted | **Broken** | | `response` | N/A | ❌ Not routed | ✅ Sent | ❌ Not emitted | **Broken** | | `toolCall` | N/A | ❌ Not routed | ✅ Sent | ❌ Not emitted | **Broken** | **Assessment:** The forward path (IDE→Drone) is blocked at `submitPrompt`. The return path (Drone→IDE) has no routing logic. ### 2.2 gadget-code:web Implementation Gaps #### Missing: `submitPrompt` Handler **File:** `gadget-code/src/lib/code-session.ts:58-60` ```typescript async onSubmitPrompt(content: string): Promise { this.log.debug("prompt received", { content }); } ``` **Required Implementation:** 1. Create `ChatTurn` document with status `Processing` 2. Build work order from `ChatSession`, `Project`, `IAiProvider`, and prompt 3. Find target drone's `DroneSession` via `SocketService.getDroneSession()` 4. Emit `processWorkOrder` to drone with full context 5. Drone acknowledges → update `ChatTurn` with drone job ID #### Missing: Drone→IDE Event Routing **File:** `gadget-code/src/lib/drone-session.ts` — no message handlers registered **Required Implementation:** ```typescript // In DroneSession.register() this.socket.on("thinking", this.onThinking.bind(this)); this.socket.on("response", this.onResponse.bind(this)); this.socket.on("toolCall", this.onToolCall.bind(this)); this.socket.on("workOrderComplete", this.onWorkOrderComplete.bind(this)); ``` Each handler must: 1. Find the corresponding `CodeSession` by `chatSessionId` 2. Forward the event to the IDE socket 3. Update the `ChatTurn` document with new data #### Missing: ChatTurn Persistence Updates **File:** `gadget-code/src/models/chat-turn.ts` exists but is not being updated during AWL execution. **Required:** Create a `TurnUpdateService` that: - Listens for streaming events (`thinking`, `response`, `toolCall`) - Applies incremental updates to the active `ChatTurn` - Handles token counting and duration tracking ### 2.3 gadget-drone Implementation Gaps #### Missing: Work Order Acknowledgment Flow **File:** `gadget-drone/src/gadget-drone.ts:209-229` ```typescript async onProcessWorkOrder(...) { const order: IAgentWorkOrder = { ... }; cb(true); // accepts immediately AgentService.process(order); // fires without waiting } ``` **Issue:** No error handling if `AgentService.process()` throws. No status update back to web service if processing fails. **Fix:** Wrap in try/catch, emit error event to web service on failure. #### Missing: Socket Event Emissions **File:** `gadget-drone/src/services/agent.ts:70-98` The AWL loop has comments `/* emit turn-tool-call socket message */` but no actual `socket.emit()` calls. **Required:** Pass socket reference into `AgentService.process()` and emit: - `thinking` when reasoning content arrives - `response` when text content streams - `toolCall` after each tool execution - `workOrderComplete` when loop exits #### Missing: Workspace Mode Management **File:** `gadget-drone/src/gadget-drone.ts:168-188` `onRequestSessionLock` sets `workspaceMode = User` but never transitions to `Agent` mode before processing. The AWL should: 1. Emit `requestWorkspaceMode(agent)` before starting 2. Wait for acknowledgment 3. Run the loop 4. Emit `requestWorkspaceMode(idle)` when complete ### 2.4 Data Model Inconsistencies #### ChatTurn Schema Mismatch **File:** `gadget-code/src/models/chat-turn.ts:70-76` Schema defines `stats.thinkingTokens` but interface `IChatTurnStats` in `packages/api/src/interfaces/chat-turn.ts:24` uses `thinkingTokenCount`. **Fix:** Standardize on `thinkingTokenCount` in both places. #### Missing User Reference in Context Messages **File:** `gadget-drone/src/services/agent.ts:101-120` ```typescript buildSessionContext(workOrder: IAgentWorkOrder): IContextChatMessage[] { const user: IUser = workOrder.turn.session.user as IUser; // ... messages.push({ // ... user: { _id: user._id.toHexString(), // Breaks if session.user is ObjectId username: user.email, displayName: user.displayName, }, }); } ``` **Issue:** `workOrder.turn.session` is typed as `IChatSession | Types.ObjectId`. If it's an ObjectId, accessing `.user` fails. **Fix:** Populate `session.user` before creating work order, or fetch user separately in drone. --- ## 3. Conflicts and Redundancies ### 3.1 Documentation Conflicts **Work Order Interface Discrepancy:** - `gadget-code/docs/agentic-workflow-loop.md:21-52` defines `IWorkOrder` with `provider.apiKey` and `context: IChatMessage[]` - `gadget-drone/docs/agentic-workflow-loop.md:15-62` defines `IWorkOrder` with `provider.sdk` and `chatSession.context` - Actual implementation in `gadget-drone/src/services/agent.ts:24-28` uses `IAgentWorkOrder` with `turn: IChatTurn` and `context: IChatTurn[]` **Resolution:** Delete both markdown docs' interface definitions. Reference `@gadget/api` interfaces only. Update docs to match `IAgentWorkOrder`. ### 3.2 Bull Queue vs Socket.IO (Resolved) **Documentation states:** - `gadget-drone/docs/agentic-workflow-loop.md:10-12`: "Each Gadget Drone registered by the User implements a named Bull job queue" - `gadget-drone/AGENTS.md`: "Queue: Bull queue named `gadget-drone`, job type `prompt`" **Reality:** `gadget-drone/src/gadget-drone.ts` uses **Socket.IO** for work order delivery, not Bull. There's no Bull queue setup in the drone. **Decision:** ✅ **Option A (Socket.IO only)** — Bull references are legacy and must be removed from all documentation. **Recovery from Drone Crash:** Handled via workspace persistence in `.gadget/` directory (see Section 7). When a drone restarts: 1. It validates/creates `.gadget/workspace.json` with workspace UUID 2. Web service reads workspace state to route retry to same directory 3. Agent can resume from last persisted ChatTurn state ### 3.3 Redundant Service Layers **Observation:** `gadget-code/src/services/api-client.ts` exists alongside direct Mongoose model usage. **Check:** `gadget-code/src/controllers/api/v1/drone.ts` likely duplicates `DroneService` methods. **Action:** Audit API controllers — if they just proxy service methods, remove and call services directly from Socket handlers. --- ## 4. Implementation Roadmap ### Phase 1: Fix Type Errors (1-2 hours) **Task 1.1:** Resolve `IAiProvider` conflict ```bash # In gadget-drone/src/services/agent.ts import { IAiProvider as AiProviderConfig } from "@gadget/ai"; import { IAiProvider as DbAiProvider } from "@gadget/api"; // Add mapper function mapDbProviderToConfig(provider: DbAiProvider | Types.ObjectId): AiProviderConfig { if (provider instanceof Types.ObjectId) { throw new Error("Provider must be populated"); } return { _id: provider._id.toHexString(), name: provider.name, sdk: provider.apiType, // note: apiType → sdk baseUrl: provider.baseUrl, apiKey: provider.apiKey, }; } ``` **Task 1.2:** Fix `DroneStatus` duplication ```bash # Delete from gadget-drone/src/services/platform.ts # Import from @gadget/api instead ``` **Task 1.3:** Fix `ChatTurnStats` field names ```bash # Align schema and interface on thinkingTokenCount ``` ### Phase 2: Implement Prompt Submission (3-4 hours) **Task 2.1:** Implement `CodeSession.onSubmitPrompt()` ```typescript async onSubmitPrompt(content: string): Promise { const turn = new ChatTurn({ createdAt: new Date(), user: this.user._id, session: this.chatSession._id, project: this.project?._id, provider: this.chatSession.provider, // must populate llm: this.chatSession.selectedModel, mode: this.chatSession.mode, status: ChatTurnStatus.Processing, prompts: { user: content }, toolCalls: [], stats: { /* zeros */ } }); await turn.save(); const droneSession = SocketService.getDroneSession(this.selectedDrone); droneSession.socket.emit( "processWorkOrder", registration, this.project, this.chatSession, turn, (success: boolean) => { if (success) { turn.status = ChatTurnStatus.Processing; turn.save(); } } ); } ``` **Task 2.2:** Add drone selection to `CodeSession` - Track `selectedDrone: IDroneRegistration` - Track `chatSession: IChatSession` - Track `project: IProject` ### Phase 3: Implement Event Routing (3-4 hours) **Task 3.1:** Add DroneSession event handlers ```typescript // In DroneSession.register() this.socket.on("thinking", (content: string) => this.onThinking(content)); this.socket.on("response", (content: string) => this.onResponse(content)); this.socket.on("toolCall", (name, params, response) => this.onToolCall(name, params, response)); this.socket.on("workOrderComplete", (turnId, success, message) => this.onWorkOrderComplete(turnId, success, message)); ``` **Task 3.2:** Implement routing logic ```typescript async onThinking(content: string): Promise { const codeSession = SocketService.getCodeSessionByChatSessionId( this.chatSessionId ); codeSession.socket.emit("thinking", content); // Update ChatTurn await ChatTurn.findByIdAndUpdate(this.currentTurnId, { thinking: content }); } ``` **Task 3.3:** Add `getCodeSessionByChatSessionId()` to `SocketService` - Maintain reverse index: `chatSessionId → CodeSession` ### Phase 4: Emit Events from AWL (2-3 hours) **Task 4.1:** Pass socket into `AgentService.process()` ```typescript // In gadget-drone/src/gadget-drone.ts await AgentService.process(order, this.socket); ``` **Task 4.2:** Add emissions to AWL loop ```typescript // In AgentService.process() for await (const chunk of response.stream) { if (chunk.type === "thinking") { socket.emit("thinking", chunk.content); } else if (chunk.type === "response") { socket.emit("response", chunk.content); } } for (const toolCall of response.toolCalls) { const result = await executeTool(toolCall); socket.emit("toolCall", toolCall.name, toolCall.arguments, result); } socket.emit("workOrderComplete", turn._id, true); ``` ### Phase 5: Workspace Persistence (4-6 hours) ⚠️ **CRITICAL PATH** **Task 5.1:** Create `.gadget/` directory structure on drone startup ```typescript // In gadget-drone/src/gadget-drone.ts, before registration async validateWorkspace(): Promise { const gadgetDir = path.join(process.cwd(), '.gadget'); const workspaceFile = path.join(gadgetDir, 'workspace.json'); if (!fs.existsSync(gadgetDir)) { await fs.promises.mkdir(gadgetDir, { recursive: true }); } let workspaceData: WorkspaceData; if (fs.existsSync(workspaceFile)) { // Validate existing workspace workspaceData = JSON.parse(await fs.promises.readFile(workspaceFile, 'utf-8')); this.log.info('validated existing workspace', { workspaceId: workspaceData.workspaceId }); } else { // Create new workspace workspaceData = { workspaceId: crypto.randomUUID(), createdAt: new Date().toISOString(), projects: [], chatSession: null, lockedProject: null, }; await fs.promises.writeFile(workspaceFile, JSON.stringify(workspaceData, null, 2)); this.log.info('created new workspace', { workspaceId: workspaceData.workspaceId }); } this.workspaceData = workspaceData; } ``` **Task 5.2:** Write work order cache during processing ```typescript // In onProcessWorkOrder() async onProcessWorkOrder(...) { const workOrderFile = path.join(this.gadgetDir, 'work-order.json'); // Write cache BEFORE processing await fs.promises.writeFile(workOrderFile, JSON.stringify({ turnId: turn._id.toHexString(), chatSessionId: chatSession._id.toHexString(), projectId: project._id.toHexString(), receivedAt: new Date().toISOString(), }, null, 2)); try { await AgentService.process(order, this.socket); } finally { // Remove cache AFTER completion await fs.promises.unlink(workOrderFile); } } ``` **Task 5.3:** Update drone registration to include workspaceId ```typescript // In PlatformService.register() interface IDroneDefinition { hostname: string; workspaceDir: string; workspaceId: string; // NEW: persistent workspace identifier } ``` **Task 5.4:** Web service stores workspaceId with ChatSession ```typescript // In packages/api/src/interfaces/chat-session.ts export interface IChatSession extends Document { // ... existing fields ... workspaceId: string; // NEW: route retries to correct workspace } ``` ### Phase 6: End-to-End Test (2 hours) **Test Scenario:** 1. Start drone: `pnpm --filter gadget-drone dev` 2. Start web: `pnpm --filter gadget-code dev:backend` 3. Start IDE: `pnpm --filter gadget-code dev:frontend` 4. Login, create project, select drone 5. Submit prompt: "Create a hello world function" 6. Verify: - ChatTurn created in MongoDB - Drone receives `processWorkOrder` - IDE receives `thinking`/`response` events - ChatTurn updated with results **Test Drone Recovery:** 1. Kill drone mid-turn (Ctrl+C) 2. Verify `.gadget/work-order.json` exists with turn data 3. Restart drone in same directory 4. Verify drone reports workspaceId to web service 5. Web service can route retry to same workspace --- ## 5. Risk Assessment ### High Risk 1. **No Streaming in @gadget/ai** - `AiApi.chat()` returns `Promise`, not async iterable - Cannot stream tokens in real-time without refactoring - **Mitigation:** Add `streamCallback` parameter (already exists in signature) but implement it in Ollama/OpenAI clients 2. **No Error Propagation** - If drone crashes mid-turn, IDE hangs forever - **Mitigation:** Add timeout + heartbeat mechanism 3. **No Workspace Persistence Layer** ⚠️ **CRITICAL** - Drone restart loses all context: which workspace, which projects, which chat session - Cannot retry work orders without knowing original workspace directory - **Mitigation:** Implement `.gadget/` directory persistence (see Section 7) ### Medium Risk 1. **Session State Not Persisted** - `CodeSession` and `DroneSession` are in-memory - Server restart loses all active sessions - **Mitigation:** Store session state in Redis 2. **No Concurrency Control** - Multiple prompts can queue for same drone - Drone processes one at a time but doesn't reject extras - **Mitigation:** Check `DroneStatus.Busy` before accepting work ### Low Risk 1. **TypeScript Strict Mode Violations** - Several `any` and missing null checks - Build passes but runtime errors possible - **Mitigation:** Enable `noUncheckedIndexedAccess` in drone --- ## 6. Recommended Next Steps 1. **Fix TypeScript errors** in `gadget-drone/src/services/agent.ts` (Phase 1) 2. **Implement `submitPrompt` handler** (Phase 2, Task 2.1) 3. **Add basic event routing** (Phase 3, minimal viable path) 4. **Test end-to-end** with stubbed tool calls 5. **Iterate** on streaming, error handling, and persistence --- ## Appendix A: File Inventory ### Core Socket Implementation - `gadget-code/src/services/socket.ts` — Socket.IO server setup ✅ - `gadget-code/src/lib/socket-session.ts` — Base session class ✅ - `gadget-code/src/lib/code-session.ts` — IDE session (partial) - `gadget-code/src/lib/drone-session.ts` — Drone session (minimal) - `gadget-drone/src/gadget-drone.ts` — Drone client ✅ ### Data Models - `packages/api/src/interfaces/*.ts` — TypeScript interfaces ✅ - `gadget-code/src/models/*.ts` — Mongoose schemas ✅ - `gadget-drone/src/models/` — None (drone is stateless) ### Message Definitions - `packages/api/src/messages/socket.ts` — Event map ✅ - `packages/api/src/messages/ide.ts` — IDE→Web messages ✅ - `packages/api/src/messages/drone.ts` — Drone messages (incomplete) ### AI Integration - `packages/ai/src/api.ts` — AI interface ✅ - `packages/ai/src/ollama.ts` — Ollama client ✅ - `packages/ai/src/openai.ts` — OpenAI client ✅ - `gadget-drone/src/services/ai.ts` — AI service wrapper ✅ - `gadget-drone/src/services/agent.ts` — AWL implementation (partial) --- ## Appendix B: Build Status | Package | Build Status | Notes | |---------|-------------|-------| | `@gadget/api` | ✅ Passes | Type definitions only | | `@gadget/ai` | ✅ Passes | AI SDK abstraction | | `gadget-code` | ✅ Passes | Web server builds | | `gadget-drone` | ❌ Fails | Type errors in `agent.ts:74,102` | **Blocking Errors:** ``` src/services/agent.ts(74,9): Argument of type 'ObjectId | IAiProvider' is not assignable to parameter of type 'IAiProvider'. src/services/agent.ts(102,48): Property 'user' does not exist on type 'ObjectId | IChatSession'. ``` --- ## 7. Workspace Persistence Architecture ### 7.1 Design Goals 1. **No External Dependencies:** End users should not need to run Redis, MongoDB, or other infrastructure just to run `gadget-drone` 2. **Crash Recovery:** When a drone crashes mid-work-order, it must be able to resume in the same workspace with the same project state 3. **Workspace Identity:** Each workspace directory needs a persistent, unique identifier that survives drone restarts 4. **State Visibility:** Both the drone and web service must be able to inspect workspace state at any time ### 7.2 Directory Structure ``` / ├── .gadget/ │ ├── workspace.json # Persistent workspace identity & state │ ├── work-order.json # Active work order cache (deleted when complete) │ └── logs/ │ └── drone.log # Drone execution logs ├── / # Project directories managed by this workspace ├── / └── ... ``` ### 7.3 File Specifications #### `.gadget/workspace.json` **Created:** When drone starts in a directory (new or existing workspace) **Updated:** When chat session lock acquired/released, projects added/removed **Deleted:** Never (only if user manually deletes workspace) ```typescript interface WorkspaceData { workspaceId: string; // UUID v4, immutable once created createdAt: string; // ISO 8601 timestamp hostname: string; // Machine hostname where drone runs workspaceDir: string; // Absolute path to workspace directory // Active session state (null when idle) chatSession: { _id: string; // MongoDB ChatSession._id name: string; // Session name for display lockedAt: string; // ISO 8601 timestamp } | null; // Project currently being worked on (null when idle) lockedProject: { _id: string; // MongoDB Project._id slug: string; // Project slug (directory name) gitUrl: string; // Remote git URL lockedAt: string; // ISO 8601 timestamp } | null; // All projects cloned into this workspace projects: Array<{ _id: string; slug: string; gitUrl: string; clonedAt: string; lastSyncAt: string; }>; // Drone registration (updated each startup) registration: { _id: string; // MongoDB DroneRegistration._id status: string; // Current drone status registeredAt: string; // ISO 8601 timestamp } | null; } ``` **Example:** ```json { "workspaceId": "550e8400-e29b-41d4-a716-446655440000", "createdAt": "2026-04-29T19:30:00.000Z", "hostname": "rob-dev-machine", "workspaceDir": "/home/rob/projects/my-gadget-workspace", "chatSession": { "_id": "65f8a9b2c3d4e5f6a7b8c9d0", "name": "Fix authentication bug", "lockedAt": "2026-04-29T20:15:00.000Z" }, "lockedProject": { "_id": "65f8a9b2c3d4e5f6a7b8c9d1", "slug": "auth-service", "gitUrl": "https://github.com/user/auth-service.git", "lockedAt": "2026-04-29T20:15:00.000Z" }, "projects": [ { "_id": "65f8a9b2c3d4e5f6a7b8c9d1", "slug": "auth-service", "gitUrl": "https://github.com/user/auth-service.git", "clonedAt": "2026-04-29T19:30:00.000Z", "lastSyncAt": "2026-04-29T20:15:00.000Z" } ], "registration": { "_id": "65f8a9b2c3d4e5f6a7b8c9d2", "status": "busy", "registeredAt": "2026-04-29T19:30:00.000Z" } } ``` #### `.gadget/work-order.json` **Created:** When `processWorkOrder` message received **Updated:** Not updated (immutable cache) **Deleted:** When work order completes (success or error) ```typescript interface WorkOrderCache { turnId: string; // ChatTurn._id for persistence updates chatSessionId: string; // For routing events back to IDE projectId: string; // For file operations workOrderId: string; // Unique ID for this work order instance receivedAt: string; // ISO 8601 timestamp prompt: string; // User's prompt (for retry context) status: 'processing' | 'completed' | 'error'; error?: string; // Error message if status === 'error' } ``` **Purpose:** If drone crashes while this file exists, the web service knows: - Which ChatTurn was being processed - Which workspace to route the retry to - What prompt needs to be re-processed ### 7.4 Drone Startup Sequence ```typescript // Pseudocode for gadget-drone.ts startup async start(): Promise { // Step 1: Validate/create workspace (BEFORE anything else) await this.validateWorkspace(); // Step 2: Get user credentials const credentials = await this.getUserCredentials(); // Step 3: Register with platform (includes workspaceId) this.registration = await PlatformService.register( credentials.email, credentials.password, process.cwd(), this.workspaceData.workspaceId, // NEW parameter ); // Step 4: Update workspace.json with registration this.workspaceData.registration = { _id: this.registration._id.toHexString(), status: 'starting', registeredAt: new Date().toISOString(), }; await this.writeWorkspaceData(); // Step 5: Connect Socket.IO await this.connectSocket(); // Step 6: Check for incomplete work order (crash recovery) await this.checkCrashRecovery(); // Step 7: Mark as available await PlatformService.setStatus(DroneStatus.Available); this.workspaceData.registration!.status = 'available'; await this.writeWorkspaceData(); } async checkCrashRecovery(): Promise { const workOrderFile = path.join(this.gadgetDir, 'work-order.json'); if (fs.existsSync(workOrderFile)) { const cache = JSON.parse(await fs.promises.readFile(workOrderFile, 'utf-8')); this.log.warn('incomplete work order found - crash recovery needed', { turnId: cache.turnId, prompt: cache.prompt, }); // Notify web service that this workspace has pending recovery this.socket.emit('requestCrashRecovery', { workspaceId: this.workspaceData.workspaceId, turnId: cache.turnId, chatSessionId: cache.chatSessionId, }); // DO NOT delete work-order.json yet - wait for web service instruction } } ``` ### 7.5 Web Service: Crash Recovery Flow When web service receives `requestCrashRecovery`: 1. **Fetch ChatTurn** by `turnId` 2. **Check Turn Status:** - If `status === 'finished'`: Acknowledge, tell drone to delete cache (turn completed before crash notification) - If `status === 'processing'`: Queue retry for this workspace 3. **Route Retry:** When retrying, filter drones by `workspaceId` to ensure same workspace handles it 4. **Acknowledge:** Tell drone it can delete `work-order.json` ```typescript // In gadget-code/src/lib/drone-session.ts async onRequestCrashRecovery(data: { workspaceId: string; turnId: string; chatSessionId: string; }): Promise { const turn = await ChatTurn.findById(data.turnId); if (!turn) { this.socket.emit('crashRecoveryResponse', { turnId: data.turnId, action: 'discard', // Turn doesn't exist, delete cache }); return; } if (turn.status === ChatTurnStatus.Finished) { this.socket.emit('crashRecoveryResponse', { turnId: data.turnId, action: 'discard', // Already done, delete cache }); return; } // Turn is still processing - mark for retry turn.status = ChatTurnStatus.Error; turn.response = 'Drone crashed during processing - retrying'; await turn.save(); this.socket.emit('crashRecoveryResponse', { turnId: data.turnId, action: 'retry', retryDelay: 5000, // Wait 5 seconds before retry }); // Schedule retry (will route to same workspaceId) setTimeout(() => { this.retryWorkOrder(turn); }, 5000); } ``` ### 7.6 Workspace-Aware Drone Selection When selecting a drone for a work order: ```typescript // In gadget-code/src/lib/code-session.ts async onSubmitPrompt(content: string): Promise { // ... create ChatTurn ... // Prefer drone in same workspace (for continuity) let targetDrone: DroneSession; if (this.chatSession.workspaceId) { // Try to find drone in same workspace targetDrone = SocketService.getDroneSessionByWorkspaceId( this.chatSession.workspaceId ); if (!targetDrone) { this.log.warn('workspace drone unavailable, selecting alternative'); // Fall through to any available drone } } if (!targetDrone) { // Select any available drone for this user targetDrone = SocketService.getAvailableDroneForUser(this.user); } // Include workspaceId in work order for persistence targetDrone.socket.emit('processWorkOrder', { // ... existing fields ... workspaceId: this.chatSession.workspaceId, }); } ``` ### 7.7 Implementation Checklist - [ ] Create `WorkspaceService` in `gadget-drone/src/services/workspace.ts` - [ ] Implement `validateWorkspace()` and `writeWorkspaceData()` - [ ] Update `PlatformService.register()` to accept `workspaceId` - [ ] Add `workspaceId` field to `IDroneRegistration` interface and model - [ ] Add `workspaceId` field to `IChatSession` interface and model - [ ] Implement `work-order.json` cache write/remove in `onProcessWorkOrder()` - [ ] Implement `requestCrashRecovery` socket handler in drone - [ ] Implement `crashRecoveryResponse` socket handler in web service - [ ] Add workspace-aware drone selection in `CodeSession.onSubmitPrompt()` - [ ] Remove all Bull queue references from documentation --- **Document Status:** Complete **Next Review:** After Phase 2 implementation