dtp/gadget

Rob Colbert 11bdd5e3b0 make reasoning effort configurable; remove sign up concept

- Implemented reasoning effort setting in SESSION panel of Chat Sessio
View
- Removed all ability to "sign up" for an account

2026-05-08 11:40:30 -04:00

5.5 KiB

Raw Blame History

Reasoning Effort

Status: ✅ IMPLEMENTED
Last Updated: May 8, 2026

Overview

Reasoning effort controls how much an AI model "thinks" before responding. Models with reasoning capabilities (like DeepSeek-R1, QwQ, OpenAI o1/o3) can produce internal chain-of-thought tokens before generating their final answer. The reasoning effort setting lets users balance between speed and thoroughness.

User Setting

The reasoning effort is configured per chat session via a dropdown in the Session sidebar:

Value	Effect
Off	No thinking output. Model responds immediately.
Low	Minimal thinking. Faster responses, less depth.
Medium	Balanced thinking. Default reasoning depth.
High	Maximum thinking. Slower but more thorough.

The dropdown is disabled when the selected model does not have hasThinking: true in its capabilities.

Data Flow

User selects "High" in Reasoning dropdown
  → PUT /api/v1/chat-sessions/:id { reasoningEffort: "high" }
  → Stored in MongoDB ChatSession.reasoningEffort
  → When creating a turn:
      ChatTurn.reasoningEffort = ChatSession.reasoningEffort  (snapshotted)
  → Drone receives work order with populated turn
  → agent.ts reads turn.reasoningEffort, maps "off" → false
  → Passes to AiService.chat() as params.reasoning
  → Provider SDK receives the appropriate parameter

Provider Mapping

Each AI provider uses a different parameter name for reasoning effort. The @gadget/ai abstraction handles the translation:

Provider	Parameter	Values
Ollama	`think`	`false`, `"low"`, `"medium"`, `"high"`
OpenAI	`reasoning_effort`	`"low"`, `"medium"`, `"high"`

Mapping Logic (in `gadget-drone/src/services/agent.ts`)

const reasoningEffort = turn.reasoningEffort || "off";
const reasoning: boolean | "low" | "medium" | "high" =
  reasoningEffort === "off" ? false : reasoningEffort;

"off" → false (disables thinking entirely)
"low" → "low" (minimal thinking)
"medium" → "medium" (balanced)
"high" → "high" (maximum thinking)

Ollama Implementation (`packages/ai/src/ollama.ts`)

const response = await this.client.chat({
  model: model.modelId,
  messages,
  stream: true,
  think: model.params.reasoning,  // boolean | "low" | "medium" | "high"
});

When think is false, the Ollama SDK disables thinking. When set to a string level, the model allocates corresponding effort.

OpenAI Implementation (`packages/ai/src/openai.ts`)

const response = await this.client.chat.completions.create({
  model: model.modelId,
  messages,
  tools,
  stream: true,
  ...(typeof model.params.reasoning === "string"
    ? { reasoning_effort: model.params.reasoning }
    : {}),
});

The reasoning_effort parameter is only passed when the value is a string ("low", "medium", "high"). When false, the parameter is omitted — standard non-reasoning models would reject it.

Streaming Thinking Content

When reasoning effort is enabled and the model produces thinking tokens, they are streamed back in real-time:

Provider SDK emits thinking tokens in stream chunks
Provider implementation (ollama.ts / openai.ts) maps them to IAiStreamChunk with type: 'thinking'
Drone forwards via Socket.IO as thinking(content) events
Frontend renders thinking content in distinct muted blocks

Thinking Chunk Handling

Ollama:

if (chunk.message.thinking) {
  await streamCallback({
    type: 'thinking',
    data: chunk.message.thinking,
  });
}

OpenAI:

if ('reasoning' in delta && delta.reasoning) {
  await streamCallback({
    type: 'thinking',
    data: delta.reasoning as string,
  });
}

Type Definitions