Files
arcadia-admin/docs/LLM_PROXY_CONTRACT.md
jules 938143f3f5 refactor: rename service references arcadia-app → arcadia-core
The Phoenix auth/identity/tenancy backend repo is being renamed
arcadia-app → arcadia-core (its primary OTP app is already arcadia_core).
Updates prose, doc paths, and git.sky-ai.com repo URLs. Deliberately
leaves the Rust crate arcadia-app-client and host arcadia-app.internal
(handled separately), and the kept namespace (issuer/release "arcadia").

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-11 13:40:25 +10:00

7.2 KiB

LLM Proxy Contract

Status: implemented. Backend lives in arcadia-core at apps/arcadia_core/lib/arcadia/ai/llm_proxy* (see commit 75669f1). This document remains the contract that lib-llm-providers-ui and app/lib/arcadia/llm-proxy.ts expect from arcadia — keep it in sync if either side changes.

Why a proxy?

The Settings UI ships in two transport modes:

  • direct — the browser fetches the API key from arcadia's vault (GET /api/v1/secrets/:name), then calls OpenAI/Anthropic/DeepSeek/Qwen directly. Works today, but the key briefly lives in browser memory and the prompt contents go straight to the upstream provider with no opportunity for arcadia to log, meter, or rewrite them.
  • proxy — the browser sends the chat request to arcadia, which reads the secret server-side and calls the upstream provider. Keys never leave arcadia. This is what production should use.

This contract only covers the proxy mode.

Endpoint

POST /api/v1/ai/llm/chat
Authorization: Bearer <arcadia session token>
X-Tenant-ID:   <tenant id>
Content-Type:  application/json

The path is /api/v1/ai/llm/chat so it lives under the existing /api/v1/ai/* scope (next to embeddings, tools, llm/usage).

Request body

The shape is OpenAI's chat-completion request, plus two arcadia-specific fields:

{
  "provider":    "openai",
  "secret_name": "llm-openai-api-key",
  "model":       "gpt-4o-mini",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Hello!" }
  ],
  "stream":      true,
  "max_tokens":  1024,
  "temperature": 0.7,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "search_docs",
        "description": "...",
        "parameters": { "type": "object", "properties": {} }
      }
    }
  ],
  "tool_choice": "auto"
}

Provider-specific fields

Field Type Notes
provider "openai" | "anthropic" | "deepseek" | "qwen" | "lmstudio" Selects the upstream backend.
secret_name string (optional for lmstudio) Name of the vault secret holding the upstream API key. The proxy resolves it via the same Secrets.get/3 used for tenant-facing reads.

The proxy must:

  1. Authenticate the arcadia session.
  2. Resolve secret_name for the current tenant (or fall back to platform-level). Refuse the call if the secret is disabled, expired, or IP-blocked. The existing Arcadia.Secrets.get/3 already returns the right error codes.
  3. Map the request to the upstream's native shape (Anthropic's /v1/messages differs from OpenAI's /v1/chat/completions).
  4. Forward it with the resolved key as the upstream's expected auth header (Authorization: Bearer <key> for OpenAI/DeepSeek/Qwen, x-api-key: <key> + anthropic-version: 2023-06-01 for Anthropic).
  5. Stream the response back as OpenAI-shape SSE regardless of upstream. (See "Response — streaming" below.)
  6. Record a usage row via the existing POST /ai/llm/usage after each completion.

Response — non-streaming (stream: false)

OpenAI chat-completion shape, returned as a single JSON document:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1714512000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Hi there!",
        "tool_calls": null
      }
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 4,
    "total_tokens": 16
  }
}

For Anthropic upstream, translate usage.input_tokens / output_tokensprompt_tokens / completion_tokens and combine content blocks into a single string (or surface tool_use blocks via tool_calls).

Response — streaming (stream: true)

Server-Sent Events, one event per delta, terminated with data: [DONE]. Each data: line is JSON of OpenAI's chat-completion delta shape:

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1714512000,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1714512000,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{"content":"Hi"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1714512000,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1714512000,"model":"gpt-4o-mini","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

For Anthropic upstream, translate content_block_delta events of type text_delta into delta content strings, and message_stop into the finish_reason: "stop" event. Tool calls translate content_block_start of type tool_use (with id + name) and the streaming JSON arguments into OpenAI-shape delta.tool_calls entries.

The client uses the OpenAI parser in @crema/llm-ui (OpenAICompatibleAdapter.stream()), so any deviation from this shape will manifest as missing tokens or hung streams.

Errors

Use the existing ArcadiaWeb.FallbackController envelope:

{ "error": { "code": "secret_disabled", "message": "Secret is disabled" } }

Specific codes the client distinguishes:

HTTP code When
401 unauthorized Missing / invalid arcadia session.
403 secret_disabled Vault returned :disabled.
410 secret_expired Vault returned :expired.
410 secret_consumed Read-once secret already consumed.
403 ip_not_allowed Caller IP blocked by the vault allowlist.
404 unknown_provider provider field not in the supported set.
502 upstream_unavailable Upstream returned 5xx or timed out.
429 rate_limited Either arcadia or upstream returned 429. Pass through Retry-After if present.

Auth

The proxy must verify the arcadia session bearer the same way the rest of /api/v1/* does. The vault read uses the caller's tenant context, so platform-admin sessions can use platform-level secrets and tenant sessions can use their own — no special privilege required beyond what /api/v1/secrets/:name already enforces.

Usage tracking

After each completion (success or failure), write a row via the existing POST /api/v1/ai/llm/usage (or call the equivalent context module directly inside the proxy). Required fields on that endpoint already include model, prompt_tokens, completion_tokens, latency_ms — the proxy can fill them from the upstream response.

Test fixture

A minimal Mix test in apps/arcadia_core/test/arcadia_web/controllers/api/ai_controller_test.exs should cover:

  • 200 with stream off, OpenAI upstream stubbed via Bypass.
  • 200 with stream on, Anthropic upstream stubbed; assert SSE chunks carry OpenAI-shape JSON.
  • 403 when the named secret is disabled.
  • 404 when provider: "unknown".
  • Usage row written on the success cases.