Architecture

docforge architecture: Confluence and local git repos flow through docforge ingest into Postgres with pgvector, then docforge serve exposes an MCP server consumed by Claude Code, Cursor, and Copilot

The data flow

1. Sources

docforge ingests from two source types:

Confluence spaces via the REST API v2. Pages are fetched by ID (configured in sources.yml), authenticated with an email + API token. Content is pulled as Confluence storage-format HTML.
Local git repositories on disk. The crawler matches configured glob patterns (default: README.md, CLAUDE.md, docs/**/*.md). It does not clone remote URLs — clone first, then point docforge at the checkout.

Each source gets a stable identifier (confluence_page_id or file path) and a SHA-256 content_hash computed from the raw content.

2. Ingest — `docforge ingest`

Deduplicate. Compare content_hash against what’s stored. Matching hashes skip re-processing.
Parse. BeautifulSoup splits HTML into semantic sections (<h1>, <h2>, paragraphs, code blocks). Confluence macros are handled where meaningful.
Chunk. Token-aware splitter (default 500 tokens). Respects section boundaries; splits paragraphs only when a section exceeds the limit. Section titles are prepended to each chunk for context.
Embed. Sentence-transformers loads Qwen3-Embedding-4B (Apache 2.0, 1024-dim). Falls back to all-MiniLM-L6-v2 (384-dim) if the primary load fails.
Store. sources (metadata + hash) and chunks (text + embedding + HNSW index) tables in Postgres. ON DELETE CASCADE keeps chunks consistent with sources.

Per-source errors are isolated: one bad Confluence page does not abort the run; a summary lists failures at the end.

3. Storage — Postgres + pgvector

sources table: metadata (type, URL, title, tags, content_hash, last_crawled_at, status).
chunks table: text, section title, 1024-dim embedding, foreign key to source.
HNSW index on embedding for cosine-similarity search (vector_cosine_ops).

The whole index fits in a Standard_B1ms Postgres Flexible Server for a corpus under ~50K chunks.

4. Serve

Two surfaces, one in-process (CLI) and one hosted (multi-user team deployment):

docforge serve — FastMCP server over stdio. Local single-user use (Claude Code, Cursor with MCP). Loads the embedding model in-process.
docforge serve --api — FastAPI over HTTP. Hosted deployment with multiple users via Entra ID authentication. Since v0.3 Phase 4b, the API offloads embedding to a separate embedder Container App by setting EMBEDDER_URL. Search API replicas drop from ~2 GB RSS to ~400 MB and cold-start in ~30 s (just container spin-up; no model load). The embedder hosts the model behind a shared-secret bearer token (EMBEDDER_TOKEN); the GPU-backed Qwen3-Embedding-4B embedder loads the ~10 GB model into VRAM in 2-3 minutes — run with minReplicas: 2 to avoid cold starts in production.

Both surfaces expose a single primary tool: search_documentation(query, user_name, team_name, area_name?, limit?). Results include source URL + title + section attribution.

What docforge is not

A chat UI. docforge has no frontend; it hands context to whatever assistant calls it.
A multi-tenant SaaS. docforge assumes a single-company trust boundary — authenticated users can query any indexed source.
A hybrid retrieval engine. Retrieval is dense-only today (cosine similarity on embeddings). BM25 fusion is on the roadmap.
A permission-aware RAG. There are no per-document ACLs at query time.

These are conscious scope decisions. If you need any of them, Onyx is likely a better fit.