Four pieces matter.
Reading the source is the documentation. Here are the four files that carry the pattern end-to-end.
Your MDX, with chunk markers.
Every section becomes a chunk. The <Chunk id> is the rowid in the index; the heading and body become the embedded text.
--- title: "Authentication" category: "core-concepts" tags: ["auth", "api-keys"] vector_metadata: importance: 0.8 --- <Chunk id="auth-api-keys"> ## API keys Issue a key from the dashboard and pass it as `Authorization: Bearer <key>` on every request. </Chunk>
One script, run at build time.
Walks content/docs/, extracts chunks, embeds them with Voyage AI (or a deterministic hash fallback for offline runs), and writes them to data/docs.db.
const chunks = extractAllChunks(); const db = getDb(); resetTables(db); for (let i = 0; i < chunks.length; i += BATCH_SIZE) { const batch = chunks.slice(i, i + BATCH_SIZE); const texts = batch.map(chunkText); const embeddings = await embed(texts, 'document'); insertChunkBatch(db, batch, embeddings); }
One route handler.
Embeds the query, runs vec0 ANN against chunks_vec, joins back to chunk metadata, ranks, returns. Used by both Spotlight (⌘K) and the chat panel.
export async function POST(req: Request) { const { q, topK = 5 } = await req.json(); const queryVec = await embedOne(q, 'query'); const hits = searchSimilar(queryVec, topK); return Response.json({ hits }); }
Multi-turn. One streamed call.
Retrieved chunks become context. History is appended for follow-up turns. The model streams the answer; the panel renders tokens as they arrive. The only network round-trip is the model call.
async function send(question: string) { const { hits } = await fetch('/api/docs/search', { method: 'POST', body: JSON.stringify({ q: question, topK: 4 }), }).then((r) => r.json()); const system = buildPrompt(hits); const stream = await callModel({ system, question, hist }); for await (const tok of stream) render(tok); }