gadget/docs/session-heartbeat.md
2026-05-08 14:27:37 -04:00

228 lines
9.9 KiB
Markdown

# Session Heartbeat & Lock Release
## Problem
A drone could be left permanently locked if the IDE disconnects or
navigates away without explicitly releasing the session lock. Once locked,
the drone rejects all new lock requests until it is restarted.
Two mechanisms solve this:
1. **`releaseSessionLock`** — An explicit message to unlock a drone from a
chat session. Sent deliberately by the IDE on view cleanup, and as a
fallback by the backend on socket disconnect.
2. **`sessionHeartbeat`** — A periodic keepalive from IDE → drone. The
drone starts a 60-second timer on each heartbeat. If no heartbeat
arrives within 60 seconds, the drone automatically releases its lock
and returns to `Syncing` state.
---
## Protocol
Two new messages, both flowing `IDE → Web → Drone`:
```
┌──────────────┐ releaseSessionLock ┌──────────────┐ releaseSessionLock ┌──────────────┐
│ │ ────────────────────────► │ │ ────────────────────────► │ │
│ IDE │ sessionHeartbeat │ Web │ sessionHeartbeat │ Drone │
│ (Browser) │ ────────────────────────► │ (Backend) │ ────────────────────────► │ (Worker) │
│ │ ◄──────────────────────── │ │ ◄──────────────────────── │ │
│ │ cb(ack) │ │ cb(ack) │ │
└──────────────┘ └──────────────┘ └──────────────┘
```
### `releaseSessionLock`
| Direction | Type | Purpose |
|-----------|------|---------|
| IDE → Web | `ClientToServerEvents.releaseSessionLock` | IDE releases a held lock |
| Web → Drone | `ServerToClientEvents.releaseSessionLock` | Web forwards to drone |
**Signature:**
```typescript
type ReleaseSessionLockMessage = (
registration: IDroneRegistration,
project: IProject,
chatSession: IChatSession,
cb: (success: boolean) => void,
) => void;
```
The callback is simpler than `requestSessionLock` — just a boolean
success, no payload needed.
### `sessionHeartbeat`
| Direction | Type | Purpose |
|-----------|------|---------|
| IDE → Web | `ClientToServerEvents.sessionHeartbeat` | Periodic keepalive |
| Web → Drone | `ServerToClientEvents.sessionHeartbeat` | Forwarded to drone |
**Signature:**
```typescript
type SessionHeartbeatMessage = (cb: (ack: boolean) => void) => void;
```
---
## Implementation Layer by Layer
### 1. Shared Types — `packages/api/src/messages/ide.ts`
Defines `ReleaseSessionLockCallback`, `ReleaseSessionLockMessage`,
`SessionHeartbeatCallback`, and `SessionHeartbeatMessage`.
### 2. Socket Event Maps — `packages/api/src/messages/socket.ts`
Both messages are registered in `ClientToServerEvents` (IDE → Web) and
`ServerToClientEvents` (Web → Drone).
### 3. Frontend Socket Client — `gadget-code/frontend/src/lib/socket.ts`
```typescript
class SocketClient {
private heartbeatInterval: ReturnType<typeof setInterval> | null = null;
releaseSessionLock(registration, project, chatSession): Promise<boolean>;
startSessionHeartbeat(): void;
stopSessionHeartbeat(): void;
}
```
- `releaseSessionLock()` wraps `socket.emit("releaseSessionLock", ...)` in
a Promise.
- `startSessionHeartbeat()` starts a `setInterval` at 19 seconds that
emits `sessionHeartbeat` with an ack callback.
- `stopSessionHeartbeat()` clears the interval.
- `disconnect()` automatically calls `stopSessionHeartbeat()`.
### 4. ChatSessionView — `gadget-code/frontend/src/pages/ChatSessionView.tsx`
- On mount after `session` and `project` are loaded: starts heartbeat.
- On unmount: stops heartbeat, then sends `releaseSessionLock` using the
drone registration from `localStorage` (`dtp_drone_registration`).
- Uses `sessionRef` / `projectRef` to capture latest state for the
unmount closure.
### 5. Backend CodeSession — `gadget-code/src/lib/code-session.ts`
**`onReleaseSessionLock(registration, project, chatSession, cb)`:**
1. Looks up `DroneSession` via `SocketService.getDroneSession(registration)`.
2. Forwards `releaseSessionLock` to the drone socket.
3. On success callback: calls `SocketService.unregisterChatSession()`,
clears `droneSession.chatSessionId`, clears local `selectedDrone`,
`chatSession`, `project`.
4. Calls `cb(success)`.
**`onSessionHeartbeat(cb)`:**
1. Guards `this.selectedDrone` — returns `cb(false)` if not set.
2. Looks up `DroneSession` via `SocketService.getDroneSession()`.
3. Forwards heartbeat to drone socket with the ack callback.
### 6. Backend Disconnect — `gadget-code/src/services/socket.ts`
When a `CodeSession` disconnects:
1. Retrieve the `CodeSession` from `codeSessions` **before** deleting
(fixes an existing bug where the session was read after deletion).
2. Call `disconnectingCodeSession.selectedDroneId` getter to check if a
drone was selected.
3. If yes, look up the `DroneSession` in `droneRegistrationIndex`.
4. Emit `releaseSessionLock` to the drone (fire-and-forget, no callback
needed since the socket is already going away).
5. Clean up `codeSessionUserIndex` and `chatSessionIndex`.
6. Delete from `codeSessions` map.
This is a safety net for cases where the IDE closes without sending a
deliberate release (browser crash, tab close, network failure).
### 7. Drone — `gadget-drone/src/gadget-drone.ts`
**State:**
```typescript
private heartbeatTimer: ReturnType<typeof setTimeout> | null = null;
```
**`onReleaseSessionLock(registration, project, chatSession, cb)`:**
1. Validates registration (must match self).
2. If no lock held: `cb(true)` — nothing to do.
3. If lock held by different session: logs warning but still releases
(caller knows what it's doing).
4. Clears `sessionLock`, sets `workspaceMode = Syncing`.
5. Emits `"session lock released"` status.
6. `cb(true)`.
**`onSessionHeartbeat(cb)`:**
1. Clears existing `heartbeatTimer` if set.
2. Sets new 60-second `heartbeatTimer` that: clears `sessionLock`, sets
`workspaceMode = Syncing`, emits status about heartbeat timeout.
3. Guards `isShuttingDown` in the timeout handler.
4. `cb(true)` (immediately acknowledges).
**Shutdown:** Heartbeat timer is cleared in `stop()` so it doesn't fire
during graceful shutdown.
---
## Timing
| Parameter | Value | Rationale |
|-----------|-------|-----------|
| Heartbeat interval | 19 seconds | ~3 heartbeats per minute, stays well within timeout |
| Heartbeat timeout | 60 seconds | Tolerates 2 missed heartbeats + network jitter |
---
## Edge Cases
| Scenario | Behavior |
|----------|----------|
| User navigates from ChatSession to Project Manager | `releaseSessionLock` sent on unmount, heartbeat stopped |
| User closes browser tab | Socket disconnect fires backend-initiated `releaseSessionLock` |
| User closes browser entirely | Socket disconnect fires backend-initiated `releaseSessionLock` |
| Network drops, socket reconnects | Heartbeat resumes normally, drone timer resets each heartbeat |
| Network drops for >60 seconds | Drone auto-releases lock, IDE detects socket disconnect |
| Backend process restarts | Drone detects socket disconnect (reconnection), eventually heartbeat timeout fires |
| Drone crashes | IDE heartbeat callbacks stop firing → IDE detects socket disconnect |
| Multiple rapid session switches | Cleanup fires per-session, old lock released before new one acquired |
| No lock held, release requested | All handlers return `cb(true)` — successful no-op |
| Wrong session tries to release | Drone logs warning but still releases (disconnect path may not carry full session context) |
| Heartbeat arrives with no lock | Drone resets timer anyway — harmless |
| Deliberate release + disconnect race | Both paths emit `releaseSessionLock` — duplicate is handled gracefully (second release finds no lock, returns `true`) |
---
## Always Release Held Locks
Every code path that acquires a `sessionLock` must also release it:
| Lock acquired | Must release here | Mechanism |
|---------------|-------------------|-----------|
| `ProjectManager.tsx` creates session + locks drone | `ChatSessionView` unmounts | `releaseSessionLock` in cleanup effect |
| `ProjectManager.tsx` opens existing session | `ChatSessionView` unmounts | `releaseSessionLock` in cleanup effect |
| Backend re-lock on socket reconnect | Backend disconnect handler | `releaseSessionLock` in `SocketService.onSocketDisconnect` |
| Any path (heartbeat fails) | Drone auto-release | 60-second `heartbeatTimer` timeout |
**Rule:** If you add a new code path that calls `requestSessionLock`, you
must also ensure a corresponding `releaseSessionLock` path exists. The
heartbeat timeout is the last resort — never rely on it as the primary
release mechanism.
---
## Verification Checklist
- [ ] `releaseSessionLock` message defined in `ide.ts`, registered in `socket.ts`
- [ ] `sessionHeartbeat` message defined in `ide.ts`, registered in `socket.ts`
- [ ] Frontend `SocketClient` has `releaseSessionLock()`, `startSessionHeartbeat()`, `stopSessionHeartbeat()`
- [ ] `ChatSessionView` starts heartbeat on load, stops + releases on unmount
- [ ] `CodeSession` registers and handles both messages
- [ ] `SocketService.onSocketDisconnect` sends `releaseSessionLock` when a code session drops
- [ ] Existing bug in disconnect handler (reading session after delete) is fixed
- [ ] `GadgetDrone` registers and handles both messages
- [ ] Drone clears `sessionLock` and resets to `Syncing` on release or heartbeat timeout
- [ ] Heartbeat timer is cleaned up during `stop()`
- [ ] All packages build without errors