arcadia-admin/docs/RAG.md

# Retrieval-Augmented Generation in arcadia-admin

This app exposes **two** lexical RAG surfaces to the assistant. They
share a contract (`search` + `read`) but live at different layers and
serve different content. The agent picks between them based on tool
descriptions; the operator chooses which to deploy based on corpus
shape.

---

## At a glance

| | Browser RAG | Server RAG |
|---|---|---|
| Lib / service | `@crema/lexical-rag-ui` | `arcadia-search` (Rust) |
| Engine | MiniSearch (BM25, JS) | Tantivy (BM25, Rust) |
| Where it runs | In the user's browser | Sibling of arcadia-core |
| Index storage | Static JSON, fetched once | mmap'd disk, ~30–80MB resident |
| Practical corpus size | ~5–10MB / ~50–100k chunks | GB-scale, no hard cap |
| Update cadence | Static — rebuilt at app build time | Live — cron, webhook, or admin trigger |
| Auth | None (bundled with the app) | JWT (via arcadia's Guardian) |
| Tool the agent calls | `search_docs(query, limit)` | `search_kb(query, corpus, limit?, tags?)` + `read_chunk(chunk_id, corpus)` |
| Source content lives in | `arcadia-admin/public/docs-index.json` | `arcadia-search`'s data dir, ingested from disk or arcadia API |
| What's it best for | Static reference docs that ship with the app | Tenant-uploaded files, audit log search, anything that grows |

---

## When the agent picks each

The system prompt in `app/routes/assistant.tsx::buildAdminPreface` tells
the model:

> Two retrieval surfaces exist for documentation/knowledge:
> `search_docs` (browser-side, BM25 over the bundled arcadia docs —
> fast, always available, small corpus) and `search_kb` (server-side,
> BM25 over arcadia-search — same docs as `corpus=docs` for parity,
> plus larger and additional corpora as the operator adds them). For
> questions about the bundled arcadia docs either is fine; prefer
> `search_kb` when you want richer hits or when the user is asking
> about content that wouldn't be in the bundled docs (uploaded files,
> tenant-specific knowledge). When `search_kb` returns a `chunk_id`
> you want to expand, call `read_chunk(chunk_id, corpus)`.

In practice DeepSeek + V3 picks `search_kb` for anything that mentions
"the kb" or sounds dynamic, and `search_docs` for quick lookups against
the bundled docs. Neither pick is wrong for content that exists in
both.

---

## Browser RAG (`@crema/lexical-rag-ui`)

**What it is.** A small React + MiniSearch wrapper. The lib provides
`RAGProvider`, `useRAG`, and a headless `createRAGClient(indexUrl)`.
The index is a single JSON file built offline by
`scripts/build-docs-index.mjs` and shipped in the app's `public/`.

**How it's wired here.**

- Build script: `arcadia-admin/scripts/build-docs-index.mjs` reads
  markdown from `../reference/arcadia-core/`, chunks at H1–H3,
  produces `public/docs-index.json`. Runs on `npm run build:docs`
  (and as the `prebuild` step before `npm run build`).
- Tool wrapper: `app/lib/admin-tools.ts` constructs a singleton
  `createRAGClient("/docs-index.json")` and exposes it as the
  `search_docs` tool. The tool returns hits with the legacy
  `category` field collapsed back from `tags[0]` so the agent's
  prior expectations stay stable.
- Storage: just the static JSON. No state, no auth, no indexer
  process.

**Limits.** Practical ceiling is ~5–10MB index. Past that, first-load
parse and browser memory get painful (200MB+ heap on a 50MB index;
mobile breaks). Updates require a build step + redeploy.

**Why it exists.** Static reference content that ships with the app —
arcadia's own docs, in this case. Always available even if the search
service is down. Zero infrastructure.

For the lib itself see `lib-lexical-rag-ui/README.md`.

---

## Server RAG (`arcadia-search`)

**What it is.** A standalone Rust HTTP service (Tantivy + axum). Single
static binary, ~30–80MB resident. Per-tenant per-corpus indexes on
disk. JWT auth, HMAC webhook intake, atomic rebuild swap, systemd
timer cron.

**How it's wired here.**

- Tools: `app/lib/admin-tools.ts` exposes `search_kb` and
  `read_chunk`. The fetch URL is `KB_BASE_URL` (default
  `http://127.0.0.1:7800`, override via `window.__ARCADIA_SEARCH_URL`
  or `VITE_ARCADIA_SEARCH_URL`). The bearer token is the user's
  arcadia JWT from `sessionStorage["arcadia_access_token"]`, with a
  `"dev"` fallback when no login.
- Reindex button: `app/routes/ai.tsx::reindexKB` calls
  `POST /index/:corpus/build` and toasts the result. Lives in the
  AI page's empty state next to the block-preview button.
- System prompt: see the snippet in `assistant.tsx::buildAdminPreface`
  above.

**Storage.** `<INDEX_DIR>/<tenant>/<corpus>/current/` per index;
`previous-<stamp>/` for the last few rebuilds (rollback). Sources can
be on-disk markdown, or pulled from arcadia's `/api/v1/digital_objects`
API (see `arcadia-search/MULTI_TENANT.md` and `ARCADIA_INTEGRATION.md`).

**Update cadence.** Three triggers, layered so each compensates for
the others' failure modes:
- **Cron** (systemd timer, hourly default) — always-on safety net.
- **Admin button** — one-click rebuild from the AI page.
- **Webhook** — arcadia POSTs `/events/changed` on file create/delete;
  search debounces (2-min default) and rebuilds.

**Why it exists.** Anything that doesn't fit the browser ceiling:
tenant-uploaded files, audit-log-ish content, multi-tenant knowledge
bases, anything that grows over time.

For the service see `arcadia-search/README.md`.
For multi-tenant config see `arcadia-search/MULTI_TENANT.md`.
For the upstream arcadia integration story (file content fetch,
text extraction, webhook signature, service tokens) see
`arcadia-search/ARCADIA_INTEGRATION.md`.

---

## How they coexist

The default deploy runs **both**:

- `search_docs` indexes the same arcadia-core docs the parity corpus
  on `arcadia-search` indexes. Same content, two engines.
- This is intentional — it means the assistant always has *something*
  to search, even if `arcadia-search` is down or unreachable. The
  failure mode is "no `search_kb`, but `search_docs` still works."
- It also gives a permanent A/B regression test: query both, compare
  hits, catch relevance regressions in either engine.

---

## Picking ONE for a new corpus

Use this checklist when adding new content:

| Question | Answer → use |
|---|---|
| Is the corpus < 5MB and basically static? | Browser |
| Does it need to update without a redeploy? | Server |
| Is it per-tenant content (uploaded files, tenant-specific KB)? | Server |
| Are you OK shipping it in the JS bundle? | Browser |
| Does it need agentic `read_chunk` follow-up? | Server (browser doesn't expose `read` over the tool surface) |
| Does it need to work offline / with no backend? | Browser |
| Is it growing > 10MB? | Server |

Most "knowledge base" content lives in the server side. The browser
side is reserved for the always-bundled reference material that ships
with the app.

---

## What lives where (cheat sheet)

| Want to | Look at |
|---|---|
| Add a doc to the bundled browser RAG | `arcadia-admin/scripts/build-docs-index.mjs` (extend `SOURCES`) |
| Add a tool to the agent | `arcadia-admin/app/lib/admin-tools.ts` |
| Change the LLM's tool-picking guidance | `arcadia-admin/app/routes/assistant.tsx::buildAdminPreface` |
| Add a corpus to arcadia-search | `arcadia-search/deploy/<tenant>/<corpus>.config.json` + new systemd timer |
| Add a tenant to arcadia-search | `arcadia-search/MULTI_TENANT.md` |
| Wire an arcadia file → search ingest | `arcadia-search/ARCADIA_INTEGRATION.md` (needs upstream changes first) |
| Reindex the server-side corpus right now | "reindex kb (docs)" button on `/ai` empty state, or `POST /index/docs/build` |