Files

jules 444516e900 docs: RAG.md — combined story for the two RAG surfaces

The browser RAG (@crema/lexical-rag-ui) and the server RAG
(arcadia-search) coexist in this app, and the agent picks between
them via tool descriptions. The two libs each have their own README,
but neither covers the picking decision or how they relate. This doc
fills that gap.

- At-a-glance table: lib, engine, runtime location, corpus size,
  update cadence, auth, agent tool, what it's best for.
- The system-prompt snippet that drives tool picking, with notes on
  observed DeepSeek V3 behavior.
- Per-surface deep dive (build path, tools, storage, limits, why it
  exists).
- Why both run by default (always-on fallback, A/B regression).
- Decision checklist for adding new corpora.
- "What lives where" cheat sheet pointing at the right file in the
  right repo for common tasks.

Cross-references arcadia-search's README.md, MULTI_TENANT.md, and
ARCADIA_INTEGRATION.md so a reader landing here can navigate to the
right rabbit hole.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-04 14:15:09 +10:00

7.5 KiB

Raw Blame History

Retrieval-Augmented Generation in arcadia-admin

This app exposes two lexical RAG surfaces to the assistant. They share a contract (search + read) but live at different layers and serve different content. The agent picks between them based on tool descriptions; the operator chooses which to deploy based on corpus shape.

At a glance

	Browser RAG	Server RAG
Lib / service	`@crema/lexical-rag-ui`	`arcadia-search` (Rust)
Engine	MiniSearch (BM25, JS)	Tantivy (BM25, Rust)
Where it runs	In the user's browser	Sibling of arcadia-app
Index storage	Static JSON, fetched once	mmap'd disk, ~30–80MB resident
Practical corpus size	~5–10MB / ~50–100k chunks	GB-scale, no hard cap
Update cadence	Static — rebuilt at app build time	Live — cron, webhook, or admin trigger
Auth	None (bundled with the app)	JWT (via arcadia's Guardian)
Tool the agent calls	`search_docs(query, limit)`	`search_kb(query, corpus, limit?, tags?)` + `read_chunk(chunk_id, corpus)`
Source content lives in	`arcadia-admin/public/docs-index.json`	`arcadia-search`'s data dir, ingested from disk or arcadia API
What's it best for	Static reference docs that ship with the app	Tenant-uploaded files, audit log search, anything that grows

When the agent picks each

The system prompt in app/routes/assistant.tsx::buildAdminPreface tells the model:

Two retrieval surfaces exist for documentation/knowledge: search_docs (browser-side, BM25 over the bundled arcadia docs — fast, always available, small corpus) and search_kb (server-side, BM25 over arcadia-search — same docs as corpus=docs for parity, plus larger and additional corpora as the operator adds them). For questions about the bundled arcadia docs either is fine; prefer search_kb when you want richer hits or when the user is asking about content that wouldn't be in the bundled docs (uploaded files, tenant-specific knowledge). When search_kb returns a chunk_id you want to expand, call read_chunk(chunk_id, corpus).

In practice DeepSeek + V3 picks search_kb for anything that mentions "the kb" or sounds dynamic, and search_docs for quick lookups against the bundled docs. Neither pick is wrong for content that exists in both.

Browser RAG (`@crema/lexical-rag-ui`)

What it is. A small React + MiniSearch wrapper. The lib provides RAGProvider, useRAG, and a headless createRAGClient(indexUrl). The index is a single JSON file built offline by scripts/build-docs-index.mjs and shipped in the app's public/.

How it's wired here.

Build script: arcadia-admin/scripts/build-docs-index.mjs reads markdown from ../reference/arcadia-app/, chunks at H1–H3, produces public/docs-index.json. Runs on npm run build:docs (and as the prebuild step before npm run build).
Tool wrapper: app/lib/admin-tools.ts constructs a singleton createRAGClient("/docs-index.json") and exposes it as the search_docs tool. The tool returns hits with the legacy category field collapsed back from tags[0] so the agent's prior expectations stay stable.
Storage: just the static JSON. No state, no auth, no indexer process.

Limits. Practical ceiling is ~5–10MB index. Past that, first-load parse and browser memory get painful (200MB+ heap on a 50MB index; mobile breaks). Updates require a build step + redeploy.

Why it exists. Static reference content that ships with the app — arcadia's own docs, in this case. Always available even if the search service is down. Zero infrastructure.

For the lib itself see lib-lexical-rag-ui/README.md.

Server RAG (`arcadia-search`)

What it is. A standalone Rust HTTP service (Tantivy + axum). Single static binary, ~30–80MB resident. Per-tenant per-corpus indexes on disk. JWT auth, HMAC webhook intake, atomic rebuild swap, systemd timer cron.

How it's wired here.

Tools: app/lib/admin-tools.ts exposes search_kb and read_chunk. The fetch URL is KB_BASE_URL (default http://127.0.0.1:7800, override via window.__ARCADIA_SEARCH_URL or VITE_ARCADIA_SEARCH_URL). The bearer token is the user's arcadia JWT from sessionStorage["arcadia_access_token"], with a "dev" fallback when no login.
Reindex button: app/routes/ai.tsx::reindexKB calls POST /index/:corpus/build and toasts the result. Lives in the AI page's empty state next to the block-preview button.
System prompt: see the snippet in assistant.tsx::buildAdminPreface above.

Storage. <INDEX_DIR>/<tenant>/<corpus>/current/ per index; previous-<stamp>/ for the last few rebuilds (rollback). Sources can be on-disk markdown, or pulled from arcadia's /api/v1/digital_objects API (see arcadia-search/MULTI_TENANT.md and ARCADIA_INTEGRATION.md).

Update cadence. Three triggers, layered so each compensates for the others' failure modes:

Cron (systemd timer, hourly default) — always-on safety net.
Admin button — one-click rebuild from the AI page.
Webhook — arcadia POSTs /events/changed on file create/delete; search debounces (2-min default) and rebuilds.

Why it exists. Anything that doesn't fit the browser ceiling: tenant-uploaded files, audit-log-ish content, multi-tenant knowledge bases, anything that grows over time.

For the service see arcadia-search/README.md. For multi-tenant config see arcadia-search/MULTI_TENANT.md. For the upstream arcadia integration story (file content fetch, text extraction, webhook signature, service tokens) see arcadia-search/ARCADIA_INTEGRATION.md.

How they coexist

The default deploy runs both:

search_docs indexes the same arcadia-app docs the parity corpus on arcadia-search indexes. Same content, two engines.
This is intentional — it means the assistant always has something to search, even if arcadia-search is down or unreachable. The failure mode is "no search_kb, but search_docs still works."
It also gives a permanent A/B regression test: query both, compare hits, catch relevance regressions in either engine.

Picking ONE for a new corpus

Use this checklist when adding new content:

Question	Answer → use
Is the corpus < 5MB and basically static?	Browser
Does it need to update without a redeploy?	Server
Is it per-tenant content (uploaded files, tenant-specific KB)?	Server
Are you OK shipping it in the JS bundle?	Browser
Does it need agentic `read_chunk` follow-up?	Server (browser doesn't expose `read` over the tool surface)
Does it need to work offline / with no backend?	Browser
Is it growing > 10MB?	Server

Most "knowledge base" content lives in the server side. The browser side is reserved for the always-bundled reference material that ships with the app.

What lives where (cheat sheet)

Want to	Look at
Add a doc to the bundled browser RAG	`arcadia-admin/scripts/build-docs-index.mjs` (extend `SOURCES`)
Add a tool to the agent	`arcadia-admin/app/lib/admin-tools.ts`
Change the LLM's tool-picking guidance	`arcadia-admin/app/routes/assistant.tsx::buildAdminPreface`
Add a corpus to arcadia-search	`arcadia-search/deploy/<tenant>/<corpus>.config.json` + new systemd timer
Add a tenant to arcadia-search	`arcadia-search/MULTI_TENANT.md`
Wire an arcadia file → search ingest	`arcadia-search/ARCADIA_INTEGRATION.md` (needs upstream changes first)
Reindex the server-side corpus right now	"reindex kb (docs)" button on `/ai` empty state, or `POST /index/docs/build`

7.5 KiB Raw Blame History Unescape Escape