Two layers for thinking-mode control:
1. Per-config default (Settings → LLM)
New "Reasoning effort" Select in the Add/Edit dialog with
off/low/medium/high/max + a budget hint per option (~2k, ~8k,
~24k, ~64k thinking tokens). Saved row meta line surfaces the
level inline so it's visible without opening the editor.
2. Per-message override (composer chip)
New ReasoningChip next to the model picker. Click cycles through
the same five levels. Hidden chrome when off (muted "think" pill);
sodium-amber active style with the level label when set.
Persisted to crema.ai.reasoning so a refresh keeps the operator's
intent, wiped together with the conversation on Clear.
When sending, withReasoning() merges reasoning_effort into the request
body as a top-level field. The proxy forwards it untouched to OpenAI /
DeepSeek (native field) and translates to Anthropic's thinking block
server-side.
reasoningEffortRef sidesteps a useCallback ordering issue —
regenerateLast/continueLast are declared before the state hook, so
they read the ref instead of a stale closure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>