gadget/docs/reasoning-effort.md
Rob Colbert 11bdd5e3b0 make reasoning effort configurable; remove sign up concept
- Implemented reasoning effort setting in SESSION panel of Chat Sessio
View
- Removed all ability to "sign up" for an account
2026-05-08 11:40:30 -04:00

172 lines
5.5 KiB
Markdown

# Reasoning Effort
**Status:****IMPLEMENTED**
**Last Updated:** May 8, 2026
## Overview
Reasoning effort controls how much an AI model "thinks" before responding. Models with reasoning capabilities (like DeepSeek-R1, QwQ, OpenAI o1/o3) can produce internal chain-of-thought tokens before generating their final answer. The reasoning effort setting lets users balance between speed and thoroughness.
## User Setting
The reasoning effort is configured per chat session via a dropdown in the Session sidebar:
| Value | Effect |
|----------|-----------------------------------------------------|
| **Off** | No thinking output. Model responds immediately. |
| **Low** | Minimal thinking. Faster responses, less depth. |
| **Medium** | Balanced thinking. Default reasoning depth. |
| **High** | Maximum thinking. Slower but more thorough. |
The dropdown is **disabled** when the selected model does not have `hasThinking: true` in its capabilities.
## Data Flow
```
User selects "High" in Reasoning dropdown
→ PUT /api/v1/chat-sessions/:id { reasoningEffort: "high" }
→ Stored in MongoDB ChatSession.reasoningEffort
→ When creating a turn:
ChatTurn.reasoningEffort = ChatSession.reasoningEffort (snapshotted)
→ Drone receives work order with populated turn
→ agent.ts reads turn.reasoningEffort, maps "off" → false
→ Passes to AiService.chat() as params.reasoning
→ Provider SDK receives the appropriate parameter
```
## Provider Mapping
Each AI provider uses a different parameter name for reasoning effort. The `@gadget/ai` abstraction handles the translation:
| Provider | Parameter | Values |
|----------|------------------|---------------------------------|
| Ollama | `think` | `false`, `"low"`, `"medium"`, `"high"` |
| OpenAI | `reasoning_effort` | `"low"`, `"medium"`, `"high"` |
### Mapping Logic (in `gadget-drone/src/services/agent.ts`)
```typescript
const reasoningEffort = turn.reasoningEffort || "off";
const reasoning: boolean | "low" | "medium" | "high" =
reasoningEffort === "off" ? false : reasoningEffort;
```
- `"off"``false` (disables thinking entirely)
- `"low"``"low"` (minimal thinking)
- `"medium"``"medium"` (balanced)
- `"high"``"high"` (maximum thinking)
### Ollama Implementation (`packages/ai/src/ollama.ts`)
```typescript
const response = await this.client.chat({
model: model.modelId,
messages,
stream: true,
think: model.params.reasoning, // boolean | "low" | "medium" | "high"
});
```
When `think` is `false`, the Ollama SDK disables thinking. When set to a string level, the model allocates corresponding effort.
### OpenAI Implementation (`packages/ai/src/openai.ts`)
```typescript
const response = await this.client.chat.completions.create({
model: model.modelId,
messages,
tools,
stream: true,
...(typeof model.params.reasoning === "string"
? { reasoning_effort: model.params.reasoning }
: {}),
});
```
The `reasoning_effort` parameter is only passed when the value is a string (`"low"`, `"medium"`, `"high"`). When `false`, the parameter is omitted — standard non-reasoning models would reject it.
## Streaming Thinking Content
When reasoning effort is enabled and the model produces thinking tokens, they are streamed back in real-time:
1. **Provider SDK** emits thinking tokens in stream chunks
2. **Provider implementation** (`ollama.ts` / `openai.ts`) maps them to `IAiStreamChunk` with `type: 'thinking'`
3. **Drone** forwards via Socket.IO as `thinking(content)` events
4. **Frontend** renders thinking content in distinct muted blocks
### Thinking Chunk Handling
**Ollama:**
```typescript
if (chunk.message.thinking) {
await streamCallback({
type: 'thinking',
data: chunk.message.thinking,
});
}
```
**OpenAI:**
```typescript
if ('reasoning' in delta && delta.reasoning) {
await streamCallback({
type: 'thinking',
data: delta.reasoning as string,
});
}
```
## Type Definitions
### `ReasoningEffort` (in `packages/api/src/interfaces/chat-session.ts`)
```typescript
export type ReasoningEffort = "off" | "low" | "medium" | "high";
```
### `IAiModelConfig.params.reasoning` (in `packages/ai/src/api.ts`)
```typescript
params: {
reasoning: boolean | "high" | "medium" | "low";
// ...
}
```
Note: The `IAiModelConfig` type uses `boolean | "high" | "medium" | "low"` (no `"off"`). The `"off"` value from the user-facing setting is mapped to `false` before reaching the AI provider layer.
### Mongoose Schema
**ChatSession** (`gadget-code/src/models/chat-session.ts`):
```typescript
reasoningEffort: {
type: String,
enum: ["off", "low", "medium", "high"],
default: "off",
}
```
**ChatTurn** (`gadget-code/src/models/chat-turn.ts`):
```typescript
reasoningEffort: {
type: String,
enum: ["off", "low", "medium", "high"],
default: "off",
}
```
## Model Capability Detection
The `hasThinking` capability is detected during model probing:
- **Ollama**: checks if model capabilities array includes `"reasoning"`
- **OpenAI**: checks if model features include `"reasoning_effort"` or fallback detection by model ID (`o1`, `o3`, `reasoning`)
The frontend uses this capability flag to enable/disable the Reasoning dropdown.
## Related Documentation
- [Streaming Responses](./streaming-responses.md) — How thinking tokens are streamed to the IDE
- [Socket Protocol](./socket-protocol.md) — Socket.IO event definitions
- [Architecture](./architecture.md) — Overall system architecture