gadget/docs/reasoning-effort.md
Rob Colbert 11bdd5e3b0 make reasoning effort configurable; remove sign up concept
- Implemented reasoning effort setting in SESSION panel of Chat Sessio
View
- Removed all ability to "sign up" for an account
2026-05-08 11:40:30 -04:00

5.5 KiB

Reasoning Effort

Status: IMPLEMENTED
Last Updated: May 8, 2026

Overview

Reasoning effort controls how much an AI model "thinks" before responding. Models with reasoning capabilities (like DeepSeek-R1, QwQ, OpenAI o1/o3) can produce internal chain-of-thought tokens before generating their final answer. The reasoning effort setting lets users balance between speed and thoroughness.

User Setting

The reasoning effort is configured per chat session via a dropdown in the Session sidebar:

Value Effect
Off No thinking output. Model responds immediately.
Low Minimal thinking. Faster responses, less depth.
Medium Balanced thinking. Default reasoning depth.
High Maximum thinking. Slower but more thorough.

The dropdown is disabled when the selected model does not have hasThinking: true in its capabilities.

Data Flow

User selects "High" in Reasoning dropdown
  → PUT /api/v1/chat-sessions/:id { reasoningEffort: "high" }
  → Stored in MongoDB ChatSession.reasoningEffort
  → When creating a turn:
      ChatTurn.reasoningEffort = ChatSession.reasoningEffort  (snapshotted)
  → Drone receives work order with populated turn
  → agent.ts reads turn.reasoningEffort, maps "off" → false
  → Passes to AiService.chat() as params.reasoning
  → Provider SDK receives the appropriate parameter

Provider Mapping

Each AI provider uses a different parameter name for reasoning effort. The @gadget/ai abstraction handles the translation:

Provider Parameter Values
Ollama think false, "low", "medium", "high"
OpenAI reasoning_effort "low", "medium", "high"

Mapping Logic (in gadget-drone/src/services/agent.ts)

const reasoningEffort = turn.reasoningEffort || "off";
const reasoning: boolean | "low" | "medium" | "high" =
  reasoningEffort === "off" ? false : reasoningEffort;
  • "off"false (disables thinking entirely)
  • "low""low" (minimal thinking)
  • "medium""medium" (balanced)
  • "high""high" (maximum thinking)

Ollama Implementation (packages/ai/src/ollama.ts)

const response = await this.client.chat({
  model: model.modelId,
  messages,
  stream: true,
  think: model.params.reasoning,  // boolean | "low" | "medium" | "high"
});

When think is false, the Ollama SDK disables thinking. When set to a string level, the model allocates corresponding effort.

OpenAI Implementation (packages/ai/src/openai.ts)

const response = await this.client.chat.completions.create({
  model: model.modelId,
  messages,
  tools,
  stream: true,
  ...(typeof model.params.reasoning === "string"
    ? { reasoning_effort: model.params.reasoning }
    : {}),
});

The reasoning_effort parameter is only passed when the value is a string ("low", "medium", "high"). When false, the parameter is omitted — standard non-reasoning models would reject it.

Streaming Thinking Content

When reasoning effort is enabled and the model produces thinking tokens, they are streamed back in real-time:

  1. Provider SDK emits thinking tokens in stream chunks
  2. Provider implementation (ollama.ts / openai.ts) maps them to IAiStreamChunk with type: 'thinking'
  3. Drone forwards via Socket.IO as thinking(content) events
  4. Frontend renders thinking content in distinct muted blocks

Thinking Chunk Handling

Ollama:

if (chunk.message.thinking) {
  await streamCallback({
    type: 'thinking',
    data: chunk.message.thinking,
  });
}

OpenAI:

if ('reasoning' in delta && delta.reasoning) {
  await streamCallback({
    type: 'thinking',
    data: delta.reasoning as string,
  });
}

Type Definitions

ReasoningEffort (in packages/api/src/interfaces/chat-session.ts)

export type ReasoningEffort = "off" | "low" | "medium" | "high";

IAiModelConfig.params.reasoning (in packages/ai/src/api.ts)

params: {
  reasoning: boolean | "high" | "medium" | "low";
  // ...
}

Note: The IAiModelConfig type uses boolean | "high" | "medium" | "low" (no "off"). The "off" value from the user-facing setting is mapped to false before reaching the AI provider layer.

Mongoose Schema

ChatSession (gadget-code/src/models/chat-session.ts):

reasoningEffort: {
  type: String,
  enum: ["off", "low", "medium", "high"],
  default: "off",
}

ChatTurn (gadget-code/src/models/chat-turn.ts):

reasoningEffort: {
  type: String,
  enum: ["off", "low", "medium", "high"],
  default: "off",
}

Model Capability Detection

The hasThinking capability is detected during model probing:

  • Ollama: checks if model capabilities array includes "reasoning"
  • OpenAI: checks if model features include "reasoning_effort" or fallback detection by model ID (o1, o3, reasoning)

The frontend uses this capability flag to enable/disable the Reasoning dropdown.