docs: RAG.md — combined story for the two RAG surfaces

The browser RAG (@crema/lexical-rag-ui) and the server RAG (arcadia-search) coexist in this app, and the agent picks between them via tool descriptions. The two libs each have their own README, but neither covers the picking decision or how they relate. This doc fills that gap. - At-a-glance table: lib, engine, runtime location, corpus size, update cadence, auth, agent tool, what it's best for. - The system-prompt snippet that drives tool picking, with notes on observed DeepSeek V3 behavior. - Per-surface deep dive (build path, tools, storage, limits, why it exists). - Why both run by default (always-on fallback, A/B regression). - Decision checklist for adding new corpora. - "What lives where" cheat sheet pointing at the right file in the right repo for common tasks. Cross-references arcadia-search's README.md, MULTI_TENANT.md, and ARCADIA_INTEGRATION.md so a reader landing here can navigate to the right rabbit hole. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 14:15:09 +10:00
parent 628691d2df
commit 444516e900
1 changed files with 173 additions and 0 deletions
--- a/docs/RAG.md
+++ b/docs/RAG.md
@@ -0,0 +1,173 @@
 # Retrieval-Augmented Generation in arcadia-admin
 This app exposes **two** lexical RAG surfaces to the assistant. They
 share a contract (`search` + `read`) but live at different layers and
 serve different content. The agent picks between them based on tool
 descriptions; the operator chooses which to deploy based on corpus
 shape.
 ---
 ## At a glance
 | | Browser RAG | Server RAG |
 |---|---|---|
 | Lib / service | `@crema/lexical-rag-ui` | `arcadia-search` (Rust) |
 | Engine | MiniSearch (BM25, JS) | Tantivy (BM25, Rust) |
 | Where it runs | In the user's browser | Sibling of arcadia-app |
 | Index storage | Static JSON, fetched once | mmap'd disk, ~30–80MB resident |
 | Practical corpus size | ~5–10MB / ~50–100k chunks | GB-scale, no hard cap |
 | Update cadence | Static — rebuilt at app build time | Live — cron, webhook, or admin trigger |
 | Auth | None (bundled with the app) | JWT (via arcadia's Guardian) |
 | Tool the agent calls | `search_docs(query, limit)` | `search_kb(query, corpus, limit?, tags?)` + `read_chunk(chunk_id, corpus)` |
 | Source content lives in | `arcadia-admin/public/docs-index.json` | `arcadia-search`'s data dir, ingested from disk or arcadia API |
 | What's it best for | Static reference docs that ship with the app | Tenant-uploaded files, audit log search, anything that grows |
 ---
 ## When the agent picks each
 The system prompt in `app/routes/assistant.tsx::buildAdminPreface` tells
 the model:
 > Two retrieval surfaces exist for documentation/knowledge:
 > `search_docs` (browser-side, BM25 over the bundled arcadia docs —
 > fast, always available, small corpus) and `search_kb` (server-side,
 > BM25 over arcadia-search — same docs as `corpus=docs` for parity,
 > plus larger and additional corpora as the operator adds them). For
 > questions about the bundled arcadia docs either is fine; prefer
 > `search_kb` when you want richer hits or when the user is asking
 > about content that wouldn't be in the bundled docs (uploaded files,
 > tenant-specific knowledge). When `search_kb` returns a `chunk_id`
 > you want to expand, call `read_chunk(chunk_id, corpus)`.
 In practice DeepSeek + V3 picks `search_kb` for anything that mentions
 "the kb" or sounds dynamic, and `search_docs` for quick lookups against
 the bundled docs. Neither pick is wrong for content that exists in
 both.
 ---
 ## Browser RAG (`@crema/lexical-rag-ui`)
 **What it is.** A small React + MiniSearch wrapper. The lib provides
 `RAGProvider`, `useRAG`, and a headless `createRAGClient(indexUrl)`.
 The index is a single JSON file built offline by
 `scripts/build-docs-index.mjs` and shipped in the app's `public/`.
 **How it's wired here.**
 - Build script: `arcadia-admin/scripts/build-docs-index.mjs` reads
  markdown from `../reference/arcadia-app/`, chunks at H1–H3,
  produces `public/docs-index.json`. Runs on `npm run build:docs`
  (and as the `prebuild` step before `npm run build`).
 - Tool wrapper: `app/lib/admin-tools.ts` constructs a singleton
  `createRAGClient("/docs-index.json")` and exposes it as the
  `search_docs` tool. The tool returns hits with the legacy
  `category` field collapsed back from `tags[0]` so the agent's
  prior expectations stay stable.
 - Storage: just the static JSON. No state, no auth, no indexer
  process.
 **Limits.** Practical ceiling is ~5–10MB index. Past that, first-load
 parse and browser memory get painful (200MB+ heap on a 50MB index;
 mobile breaks). Updates require a build step + redeploy.
 **Why it exists.** Static reference content that ships with the app —
 arcadia's own docs, in this case. Always available even if the search
 service is down. Zero infrastructure.
 For the lib itself see `lib-lexical-rag-ui/README.md`.
 ---
 ## Server RAG (`arcadia-search`)
 **What it is.** A standalone Rust HTTP service (Tantivy + axum). Single
 static binary, ~30–80MB resident. Per-tenant per-corpus indexes on
 disk. JWT auth, HMAC webhook intake, atomic rebuild swap, systemd
 timer cron.
 **How it's wired here.**
 - Tools: `app/lib/admin-tools.ts` exposes `search_kb` and
  `read_chunk`. The fetch URL is `KB_BASE_URL` (default
  `http://127.0.0.1:7800`, override via `window.__ARCADIA_SEARCH_URL`
  or `VITE_ARCADIA_SEARCH_URL`). The bearer token is the user's
  arcadia JWT from `sessionStorage["arcadia_access_token"]`, with a
  `"dev"` fallback when no login.
 - Reindex button: `app/routes/ai.tsx::reindexKB` calls
  `POST /index/:corpus/build` and toasts the result. Lives in the
  AI page's empty state next to the block-preview button.
 - System prompt: see the snippet in `assistant.tsx::buildAdminPreface`
  above.
 **Storage.** `<INDEX_DIR>/<tenant>/<corpus>/current/` per index;
 `previous-<stamp>/` for the last few rebuilds (rollback). Sources can
 be on-disk markdown, or pulled from arcadia's `/api/v1/digital_objects`
 API (see `arcadia-search/MULTI_TENANT.md` and `ARCADIA_INTEGRATION.md`).
 **Update cadence.** Three triggers, layered so each compensates for
 the others' failure modes:
 - **Cron** (systemd timer, hourly default) — always-on safety net.
 - **Admin button** — one-click rebuild from the AI page.
 - **Webhook** — arcadia POSTs `/events/changed` on file create/delete;
  search debounces (2-min default) and rebuilds.
 **Why it exists.** Anything that doesn't fit the browser ceiling:
 tenant-uploaded files, audit-log-ish content, multi-tenant knowledge
 bases, anything that grows over time.
 For the service see `arcadia-search/README.md`.
 For multi-tenant config see `arcadia-search/MULTI_TENANT.md`.
 For the upstream arcadia integration story (file content fetch,
 text extraction, webhook signature, service tokens) see
 `arcadia-search/ARCADIA_INTEGRATION.md`.
 ---
 ## How they coexist
 The default deploy runs **both**:
 - `search_docs` indexes the same arcadia-app docs the parity corpus
  on `arcadia-search` indexes. Same content, two engines.
 - This is intentional — it means the assistant always has *something*
  to search, even if `arcadia-search` is down or unreachable. The
  failure mode is "no `search_kb`, but `search_docs` still works."
 - It also gives a permanent A/B regression test: query both, compare
  hits, catch relevance regressions in either engine.
 ---
 ## Picking ONE for a new corpus
 Use this checklist when adding new content:
 | Question | Answer → use |
 |---|---|
 | Is the corpus < 5MB and basically static? | Browser |
 | Does it need to update without a redeploy? | Server |
 | Is it per-tenant content (uploaded files, tenant-specific KB)? | Server |
 | Are you OK shipping it in the JS bundle? | Browser |
 | Does it need agentic `read_chunk` follow-up? | Server (browser doesn't expose `read` over the tool surface) |
 | Does it need to work offline / with no backend? | Browser |
 | Is it growing > 10MB? | Server |
 Most "knowledge base" content lives in the server side. The browser
 side is reserved for the always-bundled reference material that ships
 with the app.
 ---
 ## What lives where (cheat sheet)
 | Want to | Look at |
 |---|---|
 | Add a doc to the bundled browser RAG | `arcadia-admin/scripts/build-docs-index.mjs` (extend `SOURCES`) |
 | Add a tool to the agent | `arcadia-admin/app/lib/admin-tools.ts` |
 | Change the LLM's tool-picking guidance | `arcadia-admin/app/routes/assistant.tsx::buildAdminPreface` |
 | Add a corpus to arcadia-search | `arcadia-search/deploy/<tenant>/<corpus>.config.json` + new systemd timer |
 | Add a tenant to arcadia-search | `arcadia-search/MULTI_TENANT.md` |
 | Wire an arcadia file → search ingest | `arcadia-search/ARCADIA_INTEGRATION.md` (needs upstream changes first) |
 | Reindex the server-side corpus right now | "reindex kb (docs)" button on `/ai` empty state, or `POST /index/docs/build` |