gadget/docs/architecture-stats.md
2026-04-29 16:09:06 -04:00

30 KiB

Gadget Code Architecture Review

Date: April 29, 2026
Scope: Socket.IO Communication System for Agentic Workflow Loop

Executive Summary

The Gadget Code architecture is 80% complete with solid foundations, but has critical gaps preventing end-to-end prompt processing. The Socket.IO infrastructure is properly structured, but message handlers lack implementation, data models have inconsistencies, and the agentic workflow loop cannot yet execute.

Primary Blocker: A prompt submitted from the IDE cannot reach the drone's AgentService for processing, and results cannot flow back to persist in ChatTurn documents.

Architectural Decision: Socket.IO Only (No Bull Queue)

Decision: Bull queue will not be used. All message routing uses Socket.IO with directed delivery.

Rationale:

  • Better performance for real-time agentic workflows
  • Eliminates Redis dependency for end users
  • Simpler deployment model

Recovery Strategy: Workspace persistence via .gadget/ directory (see Section 7: Workspace Persistence Architecture).


1. Architecture Soundness

What's Working Well

  1. Socket.IO Server Setup (gadget-code/src/services/socket.ts)

    • Proper authentication middleware distinguishing Code (IDE) vs Drone sessions
    • Session management via CodeSession and DroneSession classes
    • Clean separation of concerns with session types
  2. Event Interface Definitions (packages/api/src/messages/*.ts)

    • ClientToServerEvents and ServerToClientEvents properly typed
    • Message signatures match between IDE↔Web↔Drone
    • Callback-based request/response pattern is sound
  3. Data Model Foundation (packages/api/src/interfaces/*.ts)

    • IChatTurn, IChatSession, IChatToolCall capture AWL state
    • WorkspaceMode enum correctly models mutual exclusion
    • Socket routing architecture is correct

Critical Design Issues

Issue 1: Duplicate DroneStatus Enum

Location: packages/api/src/interfaces/drone-registration.ts vs gadget-drone/src/services/platform.ts

Both files define DroneStatus with identical values. The drone imports from its local copy, but @gadget/api exports a different type. This causes type mismatches when passing registrations between packages.

Fix: Remove DroneStatus from gadget-drone/src/services/platform.ts and import from @gadget/api.

Issue 2: Conflicting IAiProvider Interfaces

Location:

  • packages/api/src/interfaces/ai-provider.ts defines IAiProvider extends Document with apiType: "ollama" | "openai" and models: IAiModel[]
  • packages/ai/src/api.ts defines IAiProvider with sdk: "ollama" | "openai" and no Mongoose dependencies

Impact: gadget-drone/src/services/agent.ts:74 fails TypeScript compilation:

const api = this.getApi(provider); // Error: ObjectId | IAiProvider not assignable

The IChatTurn.provider field is typed as IAiProvider | Types.ObjectId (from @gadget/api), but @gadget/ai expects a different shape.

Fix:

  1. Keep @gadget/api as the Mongoose document interface (database layer)
  2. Keep @gadget/ai as the runtime config interface (AI SDK layer)
  3. Add a mapper in gadget-drone/src/services/ai.ts that converts IAiProvider | ObjectIdIAiProvider before calling createAiApi()

Issue 3: Missing callId in Tool Call Message

Location: packages/api/src/messages/drone.ts:27-30

export type ToolCallMessage = (
  name: string,
  params: string,
  response: string,
) => void;

But IChatToolCall in packages/api/src/interfaces/chat-turn.ts requires callId: string. The socket message doesn't match the persistence model.

Fix: Add callId: string as first parameter to ToolCallMessage.


2. Completeness Analysis

2.1 Socket.IO Message Flow

Message IDE→Web Web→Drone Drone→Web Web→IDE Status
requestSessionLock Sent Routed Received Not implemented Partial
requestWorkspaceMode Sent Routed Received Not implemented Partial
submitPrompt Sent Not handled Not sent N/A Broken
processWorkOrder N/A Sent Received N/A Complete
thinking N/A Not routed Sent Not emitted Broken
response N/A Not routed Sent Not emitted Broken
toolCall N/A Not routed Sent Not emitted Broken

Assessment: The forward path (IDE→Drone) is blocked at submitPrompt. The return path (Drone→IDE) has no routing logic.

2.2 gadget-code:web Implementation Gaps

Missing: submitPrompt Handler

File: gadget-code/src/lib/code-session.ts:58-60

async onSubmitPrompt(content: string): Promise<void> {
  this.log.debug("prompt received", { content });
}

Required Implementation:

  1. Create ChatTurn document with status Processing
  2. Build work order from ChatSession, Project, IAiProvider, and prompt
  3. Find target drone's DroneSession via SocketService.getDroneSession()
  4. Emit processWorkOrder to drone with full context
  5. Drone acknowledges → update ChatTurn with drone job ID

Missing: Drone→IDE Event Routing

File: gadget-code/src/lib/drone-session.ts — no message handlers registered

Required Implementation:

// In DroneSession.register()
this.socket.on("thinking", this.onThinking.bind(this));
this.socket.on("response", this.onResponse.bind(this));
this.socket.on("toolCall", this.onToolCall.bind(this));
this.socket.on("workOrderComplete", this.onWorkOrderComplete.bind(this));

Each handler must:

  1. Find the corresponding CodeSession by chatSessionId
  2. Forward the event to the IDE socket
  3. Update the ChatTurn document with new data

Missing: ChatTurn Persistence Updates

File: gadget-code/src/models/chat-turn.ts exists but is not being updated during AWL execution.

Required: Create a TurnUpdateService that:

  • Listens for streaming events (thinking, response, toolCall)
  • Applies incremental updates to the active ChatTurn
  • Handles token counting and duration tracking

2.3 gadget-drone Implementation Gaps

Missing: Work Order Acknowledgment Flow

File: gadget-drone/src/gadget-drone.ts:209-229

async onProcessWorkOrder(...) {
  const order: IAgentWorkOrder = { ... };
  cb(true); // accepts immediately
  AgentService.process(order); // fires without waiting
}

Issue: No error handling if AgentService.process() throws. No status update back to web service if processing fails.

Fix: Wrap in try/catch, emit error event to web service on failure.

Missing: Socket Event Emissions

File: gadget-drone/src/services/agent.ts:70-98

The AWL loop has comments /* emit turn-tool-call socket message */ but no actual socket.emit() calls.

Required: Pass socket reference into AgentService.process() and emit:

  • thinking when reasoning content arrives
  • response when text content streams
  • toolCall after each tool execution
  • workOrderComplete when loop exits

Missing: Workspace Mode Management

File: gadget-drone/src/gadget-drone.ts:168-188

onRequestSessionLock sets workspaceMode = User but never transitions to Agent mode before processing. The AWL should:

  1. Emit requestWorkspaceMode(agent) before starting
  2. Wait for acknowledgment
  3. Run the loop
  4. Emit requestWorkspaceMode(idle) when complete

2.4 Data Model Inconsistencies

ChatTurn Schema Mismatch

File: gadget-code/src/models/chat-turn.ts:70-76

Schema defines stats.thinkingTokens but interface IChatTurnStats in packages/api/src/interfaces/chat-turn.ts:24 uses thinkingTokenCount.

Fix: Standardize on thinkingTokenCount in both places.

Missing User Reference in Context Messages

File: gadget-drone/src/services/agent.ts:101-120

buildSessionContext(workOrder: IAgentWorkOrder): IContextChatMessage[] {
  const user: IUser = workOrder.turn.session.user as IUser;
  // ...
  messages.push({
    // ...
    user: {
      _id: user._id.toHexString(), // Breaks if session.user is ObjectId
      username: user.email,
      displayName: user.displayName,
    },
  });
}

Issue: workOrder.turn.session is typed as IChatSession | Types.ObjectId. If it's an ObjectId, accessing .user fails.

Fix: Populate session.user before creating work order, or fetch user separately in drone.


3. Conflicts and Redundancies

3.1 Documentation Conflicts

Work Order Interface Discrepancy:

  • gadget-code/docs/agentic-workflow-loop.md:21-52 defines IWorkOrder with provider.apiKey and context: IChatMessage[]
  • gadget-drone/docs/agentic-workflow-loop.md:15-62 defines IWorkOrder with provider.sdk and chatSession.context
  • Actual implementation in gadget-drone/src/services/agent.ts:24-28 uses IAgentWorkOrder with turn: IChatTurn and context: IChatTurn[]

Resolution: Delete both markdown docs' interface definitions. Reference @gadget/api interfaces only. Update docs to match IAgentWorkOrder.

3.2 Bull Queue vs Socket.IO (Resolved)

Documentation states:

  • gadget-drone/docs/agentic-workflow-loop.md:10-12: "Each Gadget Drone registered by the User implements a named Bull job queue"
  • gadget-drone/AGENTS.md: "Queue: Bull queue named gadget-drone, job type prompt"

Reality: gadget-drone/src/gadget-drone.ts uses Socket.IO for work order delivery, not Bull. There's no Bull queue setup in the drone.

Decision: Option A (Socket.IO only) — Bull references are legacy and must be removed from all documentation.

Recovery from Drone Crash: Handled via workspace persistence in .gadget/ directory (see Section 7). When a drone restarts:

  1. It validates/creates .gadget/workspace.json with workspace UUID
  2. Web service reads workspace state to route retry to same directory
  3. Agent can resume from last persisted ChatTurn state

3.3 Redundant Service Layers

Observation: gadget-code/src/services/api-client.ts exists alongside direct Mongoose model usage.

Check: gadget-code/src/controllers/api/v1/drone.ts likely duplicates DroneService methods.

Action: Audit API controllers — if they just proxy service methods, remove and call services directly from Socket handlers.


4. Implementation Roadmap

Phase 1: Fix Type Errors (1-2 hours)

Task 1.1: Resolve IAiProvider conflict

# In gadget-drone/src/services/agent.ts
import { IAiProvider as AiProviderConfig } from "@gadget/ai";
import { IAiProvider as DbAiProvider } from "@gadget/api";

// Add mapper
function mapDbProviderToConfig(provider: DbAiProvider | Types.ObjectId): AiProviderConfig {
  if (provider instanceof Types.ObjectId) {
    throw new Error("Provider must be populated");
  }
  return {
    _id: provider._id.toHexString(),
    name: provider.name,
    sdk: provider.apiType, // note: apiType → sdk
    baseUrl: provider.baseUrl,
    apiKey: provider.apiKey,
  };
}

Task 1.2: Fix DroneStatus duplication

# Delete from gadget-drone/src/services/platform.ts
# Import from @gadget/api instead

Task 1.3: Fix ChatTurnStats field names

# Align schema and interface on thinkingTokenCount

Phase 2: Implement Prompt Submission (3-4 hours)

Task 2.1: Implement CodeSession.onSubmitPrompt()

async onSubmitPrompt(content: string): Promise<void> {
  const turn = new ChatTurn({
    createdAt: new Date(),
    user: this.user._id,
    session: this.chatSession._id,
    project: this.project?._id,
    provider: this.chatSession.provider, // must populate
    llm: this.chatSession.selectedModel,
    mode: this.chatSession.mode,
    status: ChatTurnStatus.Processing,
    prompts: { user: content },
    toolCalls: [],
    stats: { /* zeros */ }
  });
  await turn.save();

  const droneSession = SocketService.getDroneSession(this.selectedDrone);
  droneSession.socket.emit(
    "processWorkOrder",
    registration,
    this.project,
    this.chatSession,
    turn,
    (success: boolean) => {
      if (success) {
        turn.status = ChatTurnStatus.Processing;
        turn.save();
      }
    }
  );
}

Task 2.2: Add drone selection to CodeSession

  • Track selectedDrone: IDroneRegistration
  • Track chatSession: IChatSession
  • Track project: IProject

Phase 3: Implement Event Routing (3-4 hours)

Task 3.1: Add DroneSession event handlers

// In DroneSession.register()
this.socket.on("thinking", (content: string) => this.onThinking(content));
this.socket.on("response", (content: string) => this.onResponse(content));
this.socket.on("toolCall", (name, params, response) => 
  this.onToolCall(name, params, response));
this.socket.on("workOrderComplete", (turnId, success, message) =>
  this.onWorkOrderComplete(turnId, success, message));

Task 3.2: Implement routing logic

async onThinking(content: string): Promise<void> {
  const codeSession = SocketService.getCodeSessionByChatSessionId(
    this.chatSessionId
  );
  codeSession.socket.emit("thinking", content);
  
  // Update ChatTurn
  await ChatTurn.findByIdAndUpdate(this.currentTurnId, {
    thinking: content
  });
}

Task 3.3: Add getCodeSessionByChatSessionId() to SocketService

  • Maintain reverse index: chatSessionId → CodeSession

Phase 4: Emit Events from AWL (2-3 hours)

Task 4.1: Pass socket into AgentService.process()

// In gadget-drone/src/gadget-drone.ts
await AgentService.process(order, this.socket);

Task 4.2: Add emissions to AWL loop

// In AgentService.process()
for await (const chunk of response.stream) {
  if (chunk.type === "thinking") {
    socket.emit("thinking", chunk.content);
  } else if (chunk.type === "response") {
    socket.emit("response", chunk.content);
  }
}

for (const toolCall of response.toolCalls) {
  const result = await executeTool(toolCall);
  socket.emit("toolCall", toolCall.name, toolCall.arguments, result);
}

socket.emit("workOrderComplete", turn._id, true);

Phase 5: Workspace Persistence (4-6 hours) ⚠️ CRITICAL PATH

Task 5.1: Create .gadget/ directory structure on drone startup

// In gadget-drone/src/gadget-drone.ts, before registration
async validateWorkspace(): Promise<void> {
  const gadgetDir = path.join(process.cwd(), '.gadget');
  const workspaceFile = path.join(gadgetDir, 'workspace.json');
  
  if (!fs.existsSync(gadgetDir)) {
    await fs.promises.mkdir(gadgetDir, { recursive: true });
  }
  
  let workspaceData: WorkspaceData;
  if (fs.existsSync(workspaceFile)) {
    // Validate existing workspace
    workspaceData = JSON.parse(await fs.promises.readFile(workspaceFile, 'utf-8'));
    this.log.info('validated existing workspace', { 
      workspaceId: workspaceData.workspaceId 
    });
  } else {
    // Create new workspace
    workspaceData = {
      workspaceId: crypto.randomUUID(),
      createdAt: new Date().toISOString(),
      projects: [],
      chatSession: null,
      lockedProject: null,
    };
    await fs.promises.writeFile(workspaceFile, JSON.stringify(workspaceData, null, 2));
    this.log.info('created new workspace', { 
      workspaceId: workspaceData.workspaceId 
    });
  }
  
  this.workspaceData = workspaceData;
}

Task 5.2: Write work order cache during processing

// In onProcessWorkOrder()
async onProcessWorkOrder(...) {
  const workOrderFile = path.join(this.gadgetDir, 'work-order.json');
  
  // Write cache BEFORE processing
  await fs.promises.writeFile(workOrderFile, JSON.stringify({
    turnId: turn._id.toHexString(),
    chatSessionId: chatSession._id.toHexString(),
    projectId: project._id.toHexString(),
    receivedAt: new Date().toISOString(),
  }, null, 2));
  
  try {
    await AgentService.process(order, this.socket);
  } finally {
    // Remove cache AFTER completion
    await fs.promises.unlink(workOrderFile);
  }
}

Task 5.3: Update drone registration to include workspaceId

// In PlatformService.register()
interface IDroneDefinition {
  hostname: string;
  workspaceDir: string;
  workspaceId: string; // NEW: persistent workspace identifier
}

Task 5.4: Web service stores workspaceId with ChatSession

// In packages/api/src/interfaces/chat-session.ts
export interface IChatSession extends Document {
  // ... existing fields ...
  workspaceId: string; // NEW: route retries to correct workspace
}

Phase 6: End-to-End Test (2 hours)

Test Scenario:

  1. Start drone: pnpm --filter gadget-drone dev
  2. Start web: pnpm --filter gadget-code dev:backend
  3. Start IDE: pnpm --filter gadget-code dev:frontend
  4. Login, create project, select drone
  5. Submit prompt: "Create a hello world function"
  6. Verify:
    • ChatTurn created in MongoDB
    • Drone receives processWorkOrder
    • IDE receives thinking/response events
    • ChatTurn updated with results

Test Drone Recovery:

  1. Kill drone mid-turn (Ctrl+C)
  2. Verify .gadget/work-order.json exists with turn data
  3. Restart drone in same directory
  4. Verify drone reports workspaceId to web service
  5. Web service can route retry to same workspace

5. Risk Assessment

High Risk

  1. No Streaming in @gadget/ai

    • AiApi.chat() returns Promise<IAiChatResponse>, not async iterable
    • Cannot stream tokens in real-time without refactoring
    • Mitigation: Add streamCallback parameter (already exists in signature) but implement it in Ollama/OpenAI clients
  2. No Error Propagation

    • If drone crashes mid-turn, IDE hangs forever
    • Mitigation: Add timeout + heartbeat mechanism
  3. No Workspace Persistence Layer ⚠️ CRITICAL

    • Drone restart loses all context: which workspace, which projects, which chat session
    • Cannot retry work orders without knowing original workspace directory
    • Mitigation: Implement .gadget/ directory persistence (see Section 7)

Medium Risk

  1. Session State Not Persisted

    • CodeSession and DroneSession are in-memory
    • Server restart loses all active sessions
    • Mitigation: Store session state in Redis
  2. No Concurrency Control

    • Multiple prompts can queue for same drone
    • Drone processes one at a time but doesn't reject extras
    • Mitigation: Check DroneStatus.Busy before accepting work

Low Risk

  1. TypeScript Strict Mode Violations
    • Several any and missing null checks
    • Build passes but runtime errors possible
    • Mitigation: Enable noUncheckedIndexedAccess in drone

  1. Fix TypeScript errors in gadget-drone/src/services/agent.ts (Phase 1)
  2. Implement submitPrompt handler (Phase 2, Task 2.1)
  3. Add basic event routing (Phase 3, minimal viable path)
  4. Test end-to-end with stubbed tool calls
  5. Iterate on streaming, error handling, and persistence

Appendix A: File Inventory

Core Socket Implementation

  • gadget-code/src/services/socket.ts — Socket.IO server setup
  • gadget-code/src/lib/socket-session.ts — Base session class
  • gadget-code/src/lib/code-session.ts — IDE session (partial)
  • gadget-code/src/lib/drone-session.ts — Drone session (minimal)
  • gadget-drone/src/gadget-drone.ts — Drone client

Data Models

  • packages/api/src/interfaces/*.ts — TypeScript interfaces
  • gadget-code/src/models/*.ts — Mongoose schemas
  • gadget-drone/src/models/ — None (drone is stateless)

Message Definitions

  • packages/api/src/messages/socket.ts — Event map
  • packages/api/src/messages/ide.ts — IDE→Web messages
  • packages/api/src/messages/drone.ts — Drone messages (incomplete)

AI Integration

  • packages/ai/src/api.ts — AI interface
  • packages/ai/src/ollama.ts — Ollama client
  • packages/ai/src/openai.ts — OpenAI client
  • gadget-drone/src/services/ai.ts — AI service wrapper
  • gadget-drone/src/services/agent.ts — AWL implementation (partial)

Appendix B: Build Status

Package Build Status Notes
@gadget/api Passes Type definitions only
@gadget/ai Passes AI SDK abstraction
gadget-code Passes Web server builds
gadget-drone Fails Type errors in agent.ts:74,102

Blocking Errors:

src/services/agent.ts(74,9): Argument of type 'ObjectId | IAiProvider' 
  is not assignable to parameter of type 'IAiProvider'.
src/services/agent.ts(102,48): Property 'user' does not exist on type 
  'ObjectId | IChatSession'.

7. Workspace Persistence Architecture

7.1 Design Goals

  1. No External Dependencies: End users should not need to run Redis, MongoDB, or other infrastructure just to run gadget-drone
  2. Crash Recovery: When a drone crashes mid-work-order, it must be able to resume in the same workspace with the same project state
  3. Workspace Identity: Each workspace directory needs a persistent, unique identifier that survives drone restarts
  4. State Visibility: Both the drone and web service must be able to inspect workspace state at any time

7.2 Directory Structure

<workspace-directory>/
├── .gadget/
│   ├── workspace.json       # Persistent workspace identity & state
│   ├── work-order.json      # Active work order cache (deleted when complete)
│   └── logs/
│       └── drone.log        # Drone execution logs
├── <project-slug-1>/        # Project directories managed by this workspace
├── <project-slug-2>/
└── ...

7.3 File Specifications

.gadget/workspace.json

Created: When drone starts in a directory (new or existing workspace)
Updated: When chat session lock acquired/released, projects added/removed
Deleted: Never (only if user manually deletes workspace)

interface WorkspaceData {
  workspaceId: string;        // UUID v4, immutable once created
  createdAt: string;          // ISO 8601 timestamp
  hostname: string;           // Machine hostname where drone runs
  workspaceDir: string;       // Absolute path to workspace directory
  
  // Active session state (null when idle)
  chatSession: {
    _id: string;              // MongoDB ChatSession._id
    name: string;             // Session name for display
    lockedAt: string;         // ISO 8601 timestamp
  } | null;
  
  // Project currently being worked on (null when idle)
  lockedProject: {
    _id: string;              // MongoDB Project._id
    slug: string;             // Project slug (directory name)
    gitUrl: string;           // Remote git URL
    lockedAt: string;         // ISO 8601 timestamp
  } | null;
  
  // All projects cloned into this workspace
  projects: Array<{
    _id: string;
    slug: string;
    gitUrl: string;
    clonedAt: string;
    lastSyncAt: string;
  }>;
  
  // Drone registration (updated each startup)
  registration: {
    _id: string;              // MongoDB DroneRegistration._id
    status: string;           // Current drone status
    registeredAt: string;     // ISO 8601 timestamp
  } | null;
}

Example:

{
  "workspaceId": "550e8400-e29b-41d4-a716-446655440000",
  "createdAt": "2026-04-29T19:30:00.000Z",
  "hostname": "rob-dev-machine",
  "workspaceDir": "/home/rob/projects/my-gadget-workspace",
  "chatSession": {
    "_id": "65f8a9b2c3d4e5f6a7b8c9d0",
    "name": "Fix authentication bug",
    "lockedAt": "2026-04-29T20:15:00.000Z"
  },
  "lockedProject": {
    "_id": "65f8a9b2c3d4e5f6a7b8c9d1",
    "slug": "auth-service",
    "gitUrl": "https://github.com/user/auth-service.git",
    "lockedAt": "2026-04-29T20:15:00.000Z"
  },
  "projects": [
    {
      "_id": "65f8a9b2c3d4e5f6a7b8c9d1",
      "slug": "auth-service",
      "gitUrl": "https://github.com/user/auth-service.git",
      "clonedAt": "2026-04-29T19:30:00.000Z",
      "lastSyncAt": "2026-04-29T20:15:00.000Z"
    }
  ],
  "registration": {
    "_id": "65f8a9b2c3d4e5f6a7b8c9d2",
    "status": "busy",
    "registeredAt": "2026-04-29T19:30:00.000Z"
  }
}

.gadget/work-order.json

Created: When processWorkOrder message received
Updated: Not updated (immutable cache)
Deleted: When work order completes (success or error)

interface WorkOrderCache {
  turnId: string;             // ChatTurn._id for persistence updates
  chatSessionId: string;      // For routing events back to IDE
  projectId: string;          // For file operations
  workOrderId: string;        // Unique ID for this work order instance
  receivedAt: string;         // ISO 8601 timestamp
  prompt: string;             // User's prompt (for retry context)
  status: 'processing' | 'completed' | 'error';
  error?: string;             // Error message if status === 'error'
}

Purpose: If drone crashes while this file exists, the web service knows:

  • Which ChatTurn was being processed
  • Which workspace to route the retry to
  • What prompt needs to be re-processed

7.4 Drone Startup Sequence

// Pseudocode for gadget-drone.ts startup
async start(): Promise<void> {
  // Step 1: Validate/create workspace (BEFORE anything else)
  await this.validateWorkspace();
  
  // Step 2: Get user credentials
  const credentials = await this.getUserCredentials();
  
  // Step 3: Register with platform (includes workspaceId)
  this.registration = await PlatformService.register(
    credentials.email,
    credentials.password,
    process.cwd(),
    this.workspaceData.workspaceId,  // NEW parameter
  );
  
  // Step 4: Update workspace.json with registration
  this.workspaceData.registration = {
    _id: this.registration._id.toHexString(),
    status: 'starting',
    registeredAt: new Date().toISOString(),
  };
  await this.writeWorkspaceData();
  
  // Step 5: Connect Socket.IO
  await this.connectSocket();
  
  // Step 6: Check for incomplete work order (crash recovery)
  await this.checkCrashRecovery();
  
  // Step 7: Mark as available
  await PlatformService.setStatus(DroneStatus.Available);
  this.workspaceData.registration!.status = 'available';
  await this.writeWorkspaceData();
}

async checkCrashRecovery(): Promise<void> {
  const workOrderFile = path.join(this.gadgetDir, 'work-order.json');
  
  if (fs.existsSync(workOrderFile)) {
    const cache = JSON.parse(await fs.promises.readFile(workOrderFile, 'utf-8'));
    
    this.log.warn('incomplete work order found - crash recovery needed', {
      turnId: cache.turnId,
      prompt: cache.prompt,
    });
    
    // Notify web service that this workspace has pending recovery
    this.socket.emit('requestCrashRecovery', {
      workspaceId: this.workspaceData.workspaceId,
      turnId: cache.turnId,
      chatSessionId: cache.chatSessionId,
    });
    
    // DO NOT delete work-order.json yet - wait for web service instruction
  }
}

7.5 Web Service: Crash Recovery Flow

When web service receives requestCrashRecovery:

  1. Fetch ChatTurn by turnId
  2. Check Turn Status:
    • If status === 'finished': Acknowledge, tell drone to delete cache (turn completed before crash notification)
    • If status === 'processing': Queue retry for this workspace
  3. Route Retry: When retrying, filter drones by workspaceId to ensure same workspace handles it
  4. Acknowledge: Tell drone it can delete work-order.json
// In gadget-code/src/lib/drone-session.ts
async onRequestCrashRecovery(data: {
  workspaceId: string;
  turnId: string;
  chatSessionId: string;
}): Promise<void> {
  const turn = await ChatTurn.findById(data.turnId);
  
  if (!turn) {
    this.socket.emit('crashRecoveryResponse', {
      turnId: data.turnId,
      action: 'discard', // Turn doesn't exist, delete cache
    });
    return;
  }
  
  if (turn.status === ChatTurnStatus.Finished) {
    this.socket.emit('crashRecoveryResponse', {
      turnId: data.turnId,
      action: 'discard', // Already done, delete cache
    });
    return;
  }
  
  // Turn is still processing - mark for retry
  turn.status = ChatTurnStatus.Error;
  turn.response = 'Drone crashed during processing - retrying';
  await turn.save();
  
  this.socket.emit('crashRecoveryResponse', {
    turnId: data.turnId,
    action: 'retry',
    retryDelay: 5000, // Wait 5 seconds before retry
  });
  
  // Schedule retry (will route to same workspaceId)
  setTimeout(() => {
    this.retryWorkOrder(turn);
  }, 5000);
}

7.6 Workspace-Aware Drone Selection

When selecting a drone for a work order:

// In gadget-code/src/lib/code-session.ts
async onSubmitPrompt(content: string): Promise<void> {
  // ... create ChatTurn ...
  
  // Prefer drone in same workspace (for continuity)
  let targetDrone: DroneSession;
  
  if (this.chatSession.workspaceId) {
    // Try to find drone in same workspace
    targetDrone = SocketService.getDroneSessionByWorkspaceId(
      this.chatSession.workspaceId
    );
    
    if (!targetDrone) {
      this.log.warn('workspace drone unavailable, selecting alternative');
      // Fall through to any available drone
    }
  }
  
  if (!targetDrone) {
    // Select any available drone for this user
    targetDrone = SocketService.getAvailableDroneForUser(this.user);
  }
  
  // Include workspaceId in work order for persistence
  targetDrone.socket.emit('processWorkOrder', {
    // ... existing fields ...
    workspaceId: this.chatSession.workspaceId,
  });
}

7.7 Implementation Checklist

  • Create WorkspaceService in gadget-drone/src/services/workspace.ts
  • Implement validateWorkspace() and writeWorkspaceData()
  • Update PlatformService.register() to accept workspaceId
  • Add workspaceId field to IDroneRegistration interface and model
  • Add workspaceId field to IChatSession interface and model
  • Implement work-order.json cache write/remove in onProcessWorkOrder()
  • Implement requestCrashRecovery socket handler in drone
  • Implement crashRecoveryResponse socket handler in web service
  • Add workspace-aware drone selection in CodeSession.onSubmitPrompt()
  • Remove all Bull queue references from documentation

Document Status: Complete
Next Review: After Phase 2 implementation