# Reasoning Effort **Status:** ✅ **IMPLEMENTED** **Last Updated:** May 8, 2026 ## Overview Reasoning effort controls how much an AI model "thinks" before responding. Models with reasoning capabilities (like DeepSeek-R1, QwQ, OpenAI o1/o3) can produce internal chain-of-thought tokens before generating their final answer. The reasoning effort setting lets users balance between speed and thoroughness. ## User Setting The reasoning effort is configured per chat session via a dropdown in the Session sidebar: | Value | Effect | |----------|-----------------------------------------------------| | **Off** | No thinking output. Model responds immediately. | | **Low** | Minimal thinking. Faster responses, less depth. | | **Medium** | Balanced thinking. Default reasoning depth. | | **High** | Maximum thinking. Slower but more thorough. | The dropdown is **disabled** when the selected model does not have `hasThinking: true` in its capabilities. ## Data Flow ``` User selects "High" in Reasoning dropdown → PUT /api/v1/chat-sessions/:id { reasoningEffort: "high" } → Stored in MongoDB ChatSession.reasoningEffort → When creating a turn: ChatTurn.reasoningEffort = ChatSession.reasoningEffort (snapshotted) → Drone receives work order with populated turn → agent.ts reads turn.reasoningEffort, maps "off" → false → Passes to AiService.chat() as params.reasoning → Provider SDK receives the appropriate parameter ``` ## Provider Mapping Each AI provider uses a different parameter name for reasoning effort. The `@gadget/ai` abstraction handles the translation: | Provider | Parameter | Values | |----------|------------------|---------------------------------| | Ollama | `think` | `false`, `"low"`, `"medium"`, `"high"` | | OpenAI | `reasoning_effort` | `"low"`, `"medium"`, `"high"` | ### Mapping Logic (in `gadget-drone/src/services/agent.ts`) ```typescript const reasoningEffort = turn.reasoningEffort || "off"; const reasoning: boolean | "low" | "medium" | "high" = reasoningEffort === "off" ? false : reasoningEffort; ``` - `"off"` → `false` (disables thinking entirely) - `"low"` → `"low"` (minimal thinking) - `"medium"` → `"medium"` (balanced) - `"high"` → `"high"` (maximum thinking) ### Ollama Implementation (`packages/ai/src/ollama.ts`) ```typescript const response = await this.client.chat({ model: model.modelId, messages, stream: true, think: model.params.reasoning, // boolean | "low" | "medium" | "high" }); ``` When `think` is `false`, the Ollama SDK disables thinking. When set to a string level, the model allocates corresponding effort. ### OpenAI Implementation (`packages/ai/src/openai.ts`) ```typescript const response = await this.client.chat.completions.create({ model: model.modelId, messages, tools, stream: true, ...(typeof model.params.reasoning === "string" ? { reasoning_effort: model.params.reasoning } : {}), }); ``` The `reasoning_effort` parameter is only passed when the value is a string (`"low"`, `"medium"`, `"high"`). When `false`, the parameter is omitted — standard non-reasoning models would reject it. ## Streaming Thinking Content When reasoning effort is enabled and the model produces thinking tokens, they are streamed back in real-time: 1. **Provider SDK** emits thinking tokens in stream chunks 2. **Provider implementation** (`ollama.ts` / `openai.ts`) maps them to `IAiStreamChunk` with `type: 'thinking'` 3. **Drone** forwards via Socket.IO as `thinking(content)` events 4. **Frontend** renders thinking content in distinct muted blocks ### Thinking Chunk Handling **Ollama:** ```typescript if (chunk.message.thinking) { await streamCallback({ type: 'thinking', data: chunk.message.thinking, }); } ``` **OpenAI:** ```typescript if ('reasoning' in delta && delta.reasoning) { await streamCallback({ type: 'thinking', data: delta.reasoning as string, }); } ``` ## Type Definitions ### `ReasoningEffort` (in `packages/api/src/interfaces/chat-session.ts`) ```typescript export type ReasoningEffort = "off" | "low" | "medium" | "high"; ``` ### `IAiModelConfig.params.reasoning` (in `packages/ai/src/api.ts`) ```typescript params: { reasoning: boolean | "high" | "medium" | "low"; // ... } ``` Note: The `IAiModelConfig` type uses `boolean | "high" | "medium" | "low"` (no `"off"`). The `"off"` value from the user-facing setting is mapped to `false` before reaching the AI provider layer. ### Mongoose Schema **ChatSession** (`gadget-code/src/models/chat-session.ts`): ```typescript reasoningEffort: { type: String, enum: ["off", "low", "medium", "high"], default: "off", } ``` **ChatTurn** (`gadget-code/src/models/chat-turn.ts`): ```typescript reasoningEffort: { type: String, enum: ["off", "low", "medium", "high"], default: "off", } ``` ## Model Capability Detection The `hasThinking` capability is detected during model probing: - **Ollama**: checks if model capabilities array includes `"reasoning"` - **OpenAI**: checks if model features include `"reasoning_effort"` or fallback detection by model ID (`o1`, `o3`, `reasoning`) The frontend uses this capability flag to enable/disable the Reasoning dropdown. ## Related Documentation - [Streaming Responses](./streaming-responses.md) — How thinking tokens are streamed to the IDE - [Socket Protocol](./socket-protocol.md) — Socket.IO event definitions - [Architecture](./architecture.md) — Overall system architecture