Reference system

RAG Pipeline

Ingestion to hybrid retrieval to reranking to cited answers.

The problem

Naive retrieval returns plausible-looking but wrong context, and answers cite nothing — so users can't tell a grounded answer from a hallucination. A pipeline needs hybrid retrieval, reranking, and enforced citations to be trustworthy.

System architecture

Documents are chunked and embedded into a vector index alongside a keyword index; queries hit both, results are fused and reranked, and the top passages are passed to the model with a citation contract that ties every claim back to a source.

Workflow

  • Source documents are ingested, normalized, and split into overlapping chunks.

  • Chunks are embedded and written to a vector index with metadata.

  • A query runs hybrid retrieval — dense vectors plus keyword search.

  • Candidate passages are fused and passed through a reranker.

  • The top passages are assembled into a grounded prompt with source ids.

  • The model answers with inline citations resolved back to the source passages.

Stack

  • An embedding model for dense vectors

  • A vector index for similarity search

  • A keyword index for lexical recall

  • A reranking model over fused candidates

  • A generation model with a citation contract

What gets logged

  • The query and the retrieval mode used

  • Candidate passages with their dense and lexical scores

  • Reranker scores and the final passage order

  • The passages actually passed to the model

  • The answer and the source ids it cited

  • Retrieval latency and token cost per query

Where evals run

Evals run offline against a fixed question set, scoring retrieval recall, citation faithfulness, and answer relevance, plus spot checks on live queries flagged as low-confidence.

Failure modes

  • Retrieval returns no relevant passage — the system answers that it doesn't know rather than fabricating.

  • A chunk is too large or too small — chunking is tuned and overlap preserved so context isn't cut mid-thought.

  • The model cites a passage it didn't use — the citation contract validates that cited ids appear in the retrieved set.

  • The index drifts from the source after updates — ingestion is idempotent and re-runnable so the index can be rebuilt.

What this demo proves

That retrieval can be made grounded and auditable — hybrid recall, reranking, and enforced citations instead of a single embedding lookup feeding an unverified answer.

Newsletter

One letter, every week. Working systems — not hot takes.

Build logs, agentic engineering decisions, agent failures, evals, and what survives real users. Sent weekly, never more.

Weekly. No spam. Unsubscribe anytime.