- Implemented reasoning effort setting in SESSION panel of Chat Sessio View - Removed all ability to "sign up" for an account
172 lines
5.5 KiB
Markdown
172 lines
5.5 KiB
Markdown
# Reasoning Effort
|
|
|
|
**Status:** ✅ **IMPLEMENTED**
|
|
**Last Updated:** May 8, 2026
|
|
|
|
## Overview
|
|
|
|
Reasoning effort controls how much an AI model "thinks" before responding. Models with reasoning capabilities (like DeepSeek-R1, QwQ, OpenAI o1/o3) can produce internal chain-of-thought tokens before generating their final answer. The reasoning effort setting lets users balance between speed and thoroughness.
|
|
|
|
## User Setting
|
|
|
|
The reasoning effort is configured per chat session via a dropdown in the Session sidebar:
|
|
|
|
| Value | Effect |
|
|
|----------|-----------------------------------------------------|
|
|
| **Off** | No thinking output. Model responds immediately. |
|
|
| **Low** | Minimal thinking. Faster responses, less depth. |
|
|
| **Medium** | Balanced thinking. Default reasoning depth. |
|
|
| **High** | Maximum thinking. Slower but more thorough. |
|
|
|
|
The dropdown is **disabled** when the selected model does not have `hasThinking: true` in its capabilities.
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
User selects "High" in Reasoning dropdown
|
|
→ PUT /api/v1/chat-sessions/:id { reasoningEffort: "high" }
|
|
→ Stored in MongoDB ChatSession.reasoningEffort
|
|
→ When creating a turn:
|
|
ChatTurn.reasoningEffort = ChatSession.reasoningEffort (snapshotted)
|
|
→ Drone receives work order with populated turn
|
|
→ agent.ts reads turn.reasoningEffort, maps "off" → false
|
|
→ Passes to AiService.chat() as params.reasoning
|
|
→ Provider SDK receives the appropriate parameter
|
|
```
|
|
|
|
## Provider Mapping
|
|
|
|
Each AI provider uses a different parameter name for reasoning effort. The `@gadget/ai` abstraction handles the translation:
|
|
|
|
| Provider | Parameter | Values |
|
|
|----------|------------------|---------------------------------|
|
|
| Ollama | `think` | `false`, `"low"`, `"medium"`, `"high"` |
|
|
| OpenAI | `reasoning_effort` | `"low"`, `"medium"`, `"high"` |
|
|
|
|
### Mapping Logic (in `gadget-drone/src/services/agent.ts`)
|
|
|
|
```typescript
|
|
const reasoningEffort = turn.reasoningEffort || "off";
|
|
const reasoning: boolean | "low" | "medium" | "high" =
|
|
reasoningEffort === "off" ? false : reasoningEffort;
|
|
```
|
|
|
|
- `"off"` → `false` (disables thinking entirely)
|
|
- `"low"` → `"low"` (minimal thinking)
|
|
- `"medium"` → `"medium"` (balanced)
|
|
- `"high"` → `"high"` (maximum thinking)
|
|
|
|
### Ollama Implementation (`packages/ai/src/ollama.ts`)
|
|
|
|
```typescript
|
|
const response = await this.client.chat({
|
|
model: model.modelId,
|
|
messages,
|
|
stream: true,
|
|
think: model.params.reasoning, // boolean | "low" | "medium" | "high"
|
|
});
|
|
```
|
|
|
|
When `think` is `false`, the Ollama SDK disables thinking. When set to a string level, the model allocates corresponding effort.
|
|
|
|
### OpenAI Implementation (`packages/ai/src/openai.ts`)
|
|
|
|
```typescript
|
|
const response = await this.client.chat.completions.create({
|
|
model: model.modelId,
|
|
messages,
|
|
tools,
|
|
stream: true,
|
|
...(typeof model.params.reasoning === "string"
|
|
? { reasoning_effort: model.params.reasoning }
|
|
: {}),
|
|
});
|
|
```
|
|
|
|
The `reasoning_effort` parameter is only passed when the value is a string (`"low"`, `"medium"`, `"high"`). When `false`, the parameter is omitted — standard non-reasoning models would reject it.
|
|
|
|
## Streaming Thinking Content
|
|
|
|
When reasoning effort is enabled and the model produces thinking tokens, they are streamed back in real-time:
|
|
|
|
1. **Provider SDK** emits thinking tokens in stream chunks
|
|
2. **Provider implementation** (`ollama.ts` / `openai.ts`) maps them to `IAiStreamChunk` with `type: 'thinking'`
|
|
3. **Drone** forwards via Socket.IO as `thinking(content)` events
|
|
4. **Frontend** renders thinking content in distinct muted blocks
|
|
|
|
### Thinking Chunk Handling
|
|
|
|
**Ollama:**
|
|
```typescript
|
|
if (chunk.message.thinking) {
|
|
await streamCallback({
|
|
type: 'thinking',
|
|
data: chunk.message.thinking,
|
|
});
|
|
}
|
|
```
|
|
|
|
**OpenAI:**
|
|
```typescript
|
|
if ('reasoning' in delta && delta.reasoning) {
|
|
await streamCallback({
|
|
type: 'thinking',
|
|
data: delta.reasoning as string,
|
|
});
|
|
}
|
|
```
|
|
|
|
## Type Definitions
|
|
|
|
### `ReasoningEffort` (in `packages/api/src/interfaces/chat-session.ts`)
|
|
|
|
```typescript
|
|
export type ReasoningEffort = "off" | "low" | "medium" | "high";
|
|
```
|
|
|
|
### `IAiModelConfig.params.reasoning` (in `packages/ai/src/api.ts`)
|
|
|
|
```typescript
|
|
params: {
|
|
reasoning: boolean | "high" | "medium" | "low";
|
|
// ...
|
|
}
|
|
```
|
|
|
|
Note: The `IAiModelConfig` type uses `boolean | "high" | "medium" | "low"` (no `"off"`). The `"off"` value from the user-facing setting is mapped to `false` before reaching the AI provider layer.
|
|
|
|
### Mongoose Schema
|
|
|
|
**ChatSession** (`gadget-code/src/models/chat-session.ts`):
|
|
```typescript
|
|
reasoningEffort: {
|
|
type: String,
|
|
enum: ["off", "low", "medium", "high"],
|
|
default: "off",
|
|
}
|
|
```
|
|
|
|
**ChatTurn** (`gadget-code/src/models/chat-turn.ts`):
|
|
```typescript
|
|
reasoningEffort: {
|
|
type: String,
|
|
enum: ["off", "low", "medium", "high"],
|
|
default: "off",
|
|
}
|
|
```
|
|
|
|
## Model Capability Detection
|
|
|
|
The `hasThinking` capability is detected during model probing:
|
|
|
|
- **Ollama**: checks if model capabilities array includes `"reasoning"`
|
|
- **OpenAI**: checks if model features include `"reasoning_effort"` or fallback detection by model ID (`o1`, `o3`, `reasoning`)
|
|
|
|
The frontend uses this capability flag to enable/disable the Reasoning dropdown.
|
|
|
|
## Related Documentation
|
|
|
|
- [Streaming Responses](./streaming-responses.md) — How thinking tokens are streamed to the IDE
|
|
- [Socket Protocol](./socket-protocol.md) — Socket.IO event definitions
|
|
- [Architecture](./architecture.md) — Overall system architecture
|