Wire @crema/llm-providers-ui: multi-provider picker + AI persistence

Replaces the single-base-URL LLM settings with the new providers lib (OpenAI, Anthropic, DeepSeek, Qwen, LM Studio). Settings/LLM hosts the catalog-aware card; the /ai route builds adapters via buildAdapter() and resolves API keys from the arcadia vault per-call (direct mode). Anthropic skips the /v1/models probe (no such endpoint) and uses catalog defaults; failed probes for keyed providers fall back to the catalog instead of dropping to mock. AI conversation now persists across navigation and refresh via a new crema.ai.live localStorage key (separate from the compact-snapshot key). useChat hydrates from initialMessages on mount, saves on every change, and "Clear conversation" wipes both state and storage. Vite needs explicit resolve.alias for @crema/llm-ui and @crema/llm-providers-ui — when a sibling lib imports another @crema/*, tsconfigPaths can't resolve it (the importing file isn't in this project's tsconfig scope). Adds docs/LLM_PROXY_CONTRACT.md describing the POST /api/v1/ai/llm/chat endpoint the backend needs for proxy mode (keys never leave the server). Direct mode works against today's arcadia; proxy mode unblocks once that endpoint ships. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 22:50:23 +10:00
parent a907e25a7c
commit 7ba415d78e
6 changed files with 439 additions and 221 deletions
--- a/docs/LLM_PROXY_CONTRACT.md
+++ b/docs/LLM_PROXY_CONTRACT.md
@@ -0,0 +1,158 @@
+# LLM Proxy Contract
+
+> **Status: not yet implemented on the backend.** This document is the contract that `lib-llm-providers-ui` expects from arcadia. Implement `POST /api/v1/ai/llm/chat` server-side to make `mode: "proxy"` work in the client.
+
+## Why a proxy?
+
+The Settings UI ships in two transport modes:
+
+- **`direct`** — the browser fetches the API key from arcadia's vault (`GET /api/v1/secrets/:name`), then calls OpenAI/Anthropic/DeepSeek/Qwen directly. Works today, but the key briefly lives in browser memory and the prompt contents go straight to the upstream provider with no opportunity for arcadia to log, meter, or rewrite them.
+- **`proxy`** — the browser sends the chat request to arcadia, which reads the secret server-side and calls the upstream provider. Keys never leave arcadia. This is what production should use.
+
+This contract only covers the proxy mode.
+
+## Endpoint
+
+```
+POST /api/v1/ai/llm/chat
+Authorization: Bearer <arcadia session token>
+X-Tenant-ID:   <tenant id>
+Content-Type:  application/json
+```
+
+The path is `/api/v1/ai/llm/chat` so it lives under the existing `/api/v1/ai/*` scope (next to `embeddings`, `tools`, `llm/usage`).
+
+## Request body
+
+The shape is OpenAI's chat-completion request, **plus** two arcadia-specific fields:
+
+```json
+{
+  "provider":    "openai",
+  "secret_name": "llm-openai-api-key",
+  "model":       "gpt-4o-mini",
+  "messages": [
+    { "role": "system", "content": "You are a helpful assistant." },
+    { "role": "user",   "content": "Hello!" }
+  ],
+  "stream":      true,
+  "max_tokens":  1024,
+  "temperature": 0.7,
+  "tools": [
+    {
+      "type": "function",
+      "function": {
+        "name": "search_docs",
+        "description": "...",
+        "parameters": { "type": "object", "properties": {} }
+      }
+    }
+  ],
+  "tool_choice": "auto"
+}
+```
+
+### Provider-specific fields
+
+| Field         | Type                                            | Notes |
+|---------------|-------------------------------------------------|-------|
+| `provider`    | `"openai" \| "anthropic" \| "deepseek" \| "qwen" \| "lmstudio"` | Selects the upstream backend. |
+| `secret_name` | `string` (optional for `lmstudio`)              | Name of the vault secret holding the upstream API key. The proxy resolves it via the same `Secrets.get/3` used for tenant-facing reads. |
+
+The proxy must:
+1. Authenticate the arcadia session.
+2. Resolve `secret_name` for the current tenant (or fall back to platform-level). Refuse the call if the secret is disabled, expired, or IP-blocked. The existing `Arcadia.Secrets.get/3` already returns the right error codes.
+3. Map the request to the upstream's native shape (Anthropic's `/v1/messages` differs from OpenAI's `/v1/chat/completions`).
+4. Forward it with the resolved key as the upstream's expected auth header (`Authorization: Bearer <key>` for OpenAI/DeepSeek/Qwen, `x-api-key: <key>` + `anthropic-version: 2023-06-01` for Anthropic).
+5. Stream the response back as **OpenAI-shape SSE** regardless of upstream. (See "Response — streaming" below.)
+6. Record a usage row via the existing `POST /ai/llm/usage` after each completion.
+
+## Response — non-streaming (`stream: false`)
+
+OpenAI chat-completion shape, returned as a single JSON document:
+
+```json
+{
+  "id": "chatcmpl-...",
+  "object": "chat.completion",
+  "created": 1714512000,
+  "model": "gpt-4o-mini",
+  "choices": [
+    {
+      "index": 0,
+      "finish_reason": "stop",
+      "message": {
+        "role": "assistant",
+        "content": "Hi there!",
+        "tool_calls": null
+      }
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 12,
+    "completion_tokens": 4,
+    "total_tokens": 16
+  }
+}
+```
+
+For Anthropic upstream, translate `usage.input_tokens` / `output_tokens` → `prompt_tokens` / `completion_tokens` and combine `content` blocks into a single string (or surface `tool_use` blocks via `tool_calls`).
+
+## Response — streaming (`stream: true`)
+
+Server-Sent Events, one event per delta, terminated with `data: [DONE]`. Each `data:` line is JSON of OpenAI's chat-completion *delta* shape:
+
+```
+data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1714512000,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
+
+data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1714512000,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{"content":"Hi"},"finish_reason":null}]}
+
+data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1714512000,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]}
+
+data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1714512000,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
+
+data: [DONE]
+```
+
+For Anthropic upstream, translate `content_block_delta` events of type `text_delta` into delta `content` strings, and `message_stop` into the `finish_reason: "stop"` event. Tool calls translate `content_block_start` of type `tool_use` (with id + name) and the streaming JSON arguments into OpenAI-shape `delta.tool_calls` entries.
+
+The client uses the OpenAI parser in `@crema/llm-ui` (`OpenAICompatibleAdapter.stream()`), so any deviation from this shape will manifest as missing tokens or hung streams.
+
+## Errors
+
+Use the existing `ArcadiaWeb.FallbackController` envelope:
+
+```json
+{ "error": { "code": "secret_disabled", "message": "Secret is disabled" } }
+```
+
+Specific codes the client distinguishes:
+
+| HTTP | code                    | When |
+|------|-------------------------|------|
+| 401  | `unauthorized`          | Missing / invalid arcadia session. |
+| 403  | `secret_disabled`       | Vault returned `:disabled`. |
+| 410  | `secret_expired`        | Vault returned `:expired`. |
+| 410  | `secret_consumed`       | Read-once secret already consumed. |
+| 403  | `ip_not_allowed`        | Caller IP blocked by the vault allowlist. |
+| 404  | `unknown_provider`      | `provider` field not in the supported set. |
+| 502  | `upstream_unavailable`  | Upstream returned 5xx or timed out. |
+| 429  | `rate_limited`          | Either arcadia or upstream returned 429. Pass through `Retry-After` if present. |
+
+## Auth
+
+The proxy must verify the arcadia session bearer the same way the rest of `/api/v1/*` does. The vault read uses the **caller's tenant context**, so platform-admin sessions can use platform-level secrets and tenant sessions can use their own — no special privilege required beyond what `/api/v1/secrets/:name` already enforces.
+
+## Usage tracking
+
+After each completion (success or failure), write a row via the existing `POST /api/v1/ai/llm/usage` (or call the equivalent context module directly inside the proxy). Required fields on that endpoint already include model, prompt_tokens, completion_tokens, latency_ms — the proxy can fill them from the upstream response.
+
+## Test fixture
+
+A minimal Mix test in `apps/arcadia_core/test/arcadia_web/controllers/api/ai_controller_test.exs` should cover:
+
+- 200 with stream off, OpenAI upstream stubbed via Bypass.
+- 200 with stream on, Anthropic upstream stubbed; assert SSE chunks carry OpenAI-shape JSON.
+- 403 when the named secret is disabled.
+- 404 when `provider: "unknown"`.
+- Usage row written on the success cases.