gadget/docs/reasoning-effort.md

# Reasoning Effort

**Status:** ✅ **IMPLEMENTED**
**Last Updated:** May 8, 2026

## Overview

Reasoning effort controls how much an AI model "thinks" before responding. Models with reasoning capabilities (like DeepSeek-R1, QwQ, OpenAI o1/o3) can produce internal chain-of-thought tokens before generating their final answer. The reasoning effort setting lets users balance between speed and thoroughness.

## User Setting

The reasoning effort is configured per chat session via a dropdown in the Session sidebar:

| Value    | Effect                                              |
|----------|-----------------------------------------------------|
| **Off**  | No thinking output. Model responds immediately.     |
| **Low**  | Minimal thinking. Faster responses, less depth.     |
| **Medium** | Balanced thinking. Default reasoning depth.       |
| **High** | Maximum thinking. Slower but more thorough.         |

The dropdown is **disabled** when the selected model does not have `hasThinking: true` in its capabilities.

## Data Flow

```
User selects "High" in Reasoning dropdown
  → PUT /api/v1/chat-sessions/:id { reasoningEffort: "high" }
  → Stored in MongoDB ChatSession.reasoningEffort
  → When creating a turn:
      ChatTurn.reasoningEffort = ChatSession.reasoningEffort  (snapshotted)
  → Drone receives work order with populated turn
  → agent.ts reads turn.reasoningEffort, maps "off" → false
  → Passes to AiService.chat() as params.reasoning
  → Provider SDK receives the appropriate parameter
```

## Provider Mapping

Each AI provider uses a different parameter name for reasoning effort. The `@gadget/ai` abstraction handles the translation:

| Provider | Parameter        | Values                          |
|----------|------------------|---------------------------------|
| Ollama   | `think`          | `false`, `"low"`, `"medium"`, `"high"` |
| OpenAI   | `reasoning_effort` | `"low"`, `"medium"`, `"high"`    |

### Mapping Logic (in `gadget-drone/src/services/agent.ts`)

```typescript
const reasoningEffort = turn.reasoningEffort || "off";
const reasoning: boolean | "low" | "medium" | "high" =
  reasoningEffort === "off" ? false : reasoningEffort;
```

- `"off"` → `false` (disables thinking entirely)
- `"low"` → `"low"` (minimal thinking)
- `"medium"` → `"medium"` (balanced)
- `"high"` → `"high"` (maximum thinking)

### Ollama Implementation (`packages/ai/src/ollama.ts`)

```typescript
const response = await this.client.chat({
  model: model.modelId,
  messages,
  stream: true,
  think: model.params.reasoning,  // boolean | "low" | "medium" | "high"
});
```

When `think` is `false`, the Ollama SDK disables thinking. When set to a string level, the model allocates corresponding effort.

### OpenAI Implementation (`packages/ai/src/openai.ts`)

```typescript
const response = await this.client.chat.completions.create({
  model: model.modelId,
  messages,
  tools,
  stream: true,
  ...(typeof model.params.reasoning === "string"
    ? { reasoning_effort: model.params.reasoning }
    : {}),
});
```

The `reasoning_effort` parameter is only passed when the value is a string (`"low"`, `"medium"`, `"high"`). When `false`, the parameter is omitted — standard non-reasoning models would reject it.

## Streaming Thinking Content

When reasoning effort is enabled and the model produces thinking tokens, they are streamed back in real-time:

1. **Provider SDK** emits thinking tokens in stream chunks
2. **Provider implementation** (`ollama.ts` / `openai.ts`) maps them to `IAiStreamChunk` with `type: 'thinking'`
3. **Drone** forwards via Socket.IO as `thinking(content)` events
4. **Frontend** renders thinking content in distinct muted blocks

### Thinking Chunk Handling

**Ollama:**
```typescript
if (chunk.message.thinking) {
  await streamCallback({
    type: 'thinking',
    data: chunk.message.thinking,
  });
}
```

**OpenAI:**
```typescript
if ('reasoning' in delta && delta.reasoning) {
  await streamCallback({
    type: 'thinking',
    data: delta.reasoning as string,
  });
}
```

## Type Definitions

### `ReasoningEffort` (in `packages/api/src/interfaces/chat-session.ts`)

```typescript
export type ReasoningEffort = "off" | "low" | "medium" | "high";
```

### `IAiModelConfig.params.reasoning` (in `packages/ai/src/api.ts`)

```typescript
params: {
  reasoning: boolean | "high" | "medium" | "low";
  // ...
}
```

Note: The `IAiModelConfig` type uses `boolean | "high" | "medium" | "low"` (no `"off"`). The `"off"` value from the user-facing setting is mapped to `false` before reaching the AI provider layer.

### Mongoose Schema

**ChatSession** (`gadget-code/src/models/chat-session.ts`):
```typescript
reasoningEffort: {
  type: String,
  enum: ["off", "low", "medium", "high"],
  default: "off",
}
```

**ChatTurn** (`gadget-code/src/models/chat-turn.ts`):
```typescript
reasoningEffort: {
  type: String,
  enum: ["off", "low", "medium", "high"],
  default: "off",
}
```

## Model Capability Detection

The `hasThinking` capability is detected during model probing:

- **Ollama**: checks if model capabilities array includes `"reasoning"`
- **OpenAI**: checks if model features include `"reasoning_effort"` or fallback detection by model ID (`o1`, `o3`, `reasoning`)

The frontend uses this capability flag to enable/disable the Reasoning dropdown.

## Related Documentation

- [Streaming Responses](./streaming-responses.md) — How thinking tokens are streamed to the IDE
- [Socket Protocol](./socket-protocol.md) — Socket.IO event definitions
- [Architecture](./architecture.md) — Overall system architecture