Glossary
RAG (Retrieval-Augmented Generation)
RAG means the model reads your sources before answering.
Good RAG setup usually improves factual quality and citation quality.
Definition
RAG (Retrieval-Augmented Generation) is a setup where a model retrieves relevant documents first, then generates an answer using that context. For brand teams, this explains why updating docs, schema, and crawlable FAQs can change answer quality without retraining the base model.
Before/after content cleanup
| Metric | Before cleanup | After cleanup | Why it moved |
|---|---|---|---|
| Retrieval recall@3 | 63% | 81% | Better chunking + fewer duplicate pages |
| Answer accuracy | 55% | 74% | Fresh docs + explicit version dates |
| Contradiction rate | 22% | 11% | Canonicals and one source of truth |
Mini chart (accuracy trend):
Before 55% ▇▇▇▇▇▇
After 74% ▇▇▇▇▇▇▇▇▇
This is what healthy RAG impact looks like: better retrieval first, then better answers.
How it's computed
Typical pipeline: chunk documents, index embeddings, retrieve top passages at query time, then ask the model to answer from those passages. Quality depends on source freshness, clean chunking, and guardrails for low-confidence retrieval.
Retrieval quality example
On 100 benchmark prompts:
- Top-3 retrieval contained at least one correct passage in 81 prompts.
- Generated answer was factually correct in 74 prompts.
Retrieval recall@3 = 81 / 100 = 81%
Answer accuracy = 74 / 100 = 74%
Gap (81% vs 74%) tells you generation still introduces errors even with good retrieval.
How it works in practice
What to optimize
- Source of truth pages — clear H1/H2, dated facts, canonical URLs the retriever can fetch.
- llms.txt + robots — make sure AI crawlers you care about can reach the corpus you want retrieved.
- Measurement — compare LLM-Score and quote-level citations before/after you publish or restructure content.
Pipeline sketch
User query
-> Retriever (top-k passages)
-> Context assembly
-> LLM answer
-> Citation output
The weakest block defines your ceiling. Great generation cannot rescue poor retrieval.
How to read it
RAG reduces hallucinations but does not remove them. If retrieval pulls a bad source, generation can still amplify it. Use fanout queries to test whether fixes hold across prompt variations.
RAG cleanup sprint (14 days)
- Remove duplicate and stale pages from the retrieval corpus.
- Re-chunk long pages by semantic sections, not fixed character windows.
- Add explicit facts in bullet form for retriever-friendly passages.
- Track recall@k and answer accuracy on the same benchmark prompts.
- Add low-confidence fallback instructions to avoid forced wrong answers.
Retrieval-to-answer conversion benchmark
| Pattern | Meaning |
|---|---|
| recall@3 75%+, accuracy 70%+ | Mature grounding layer |
| recall@3 high, accuracy low | Generation or prompt policy issue |
| recall@3 low, accuracy low | Corpus and indexing issue |
Track both numbers together; one without the other can mislead decisions.
When to use
- When product facts change weekly (pricing, regions, integrations).
- When support and marketing disagree on wording — unify the retrieved layer.
- When you need auditability: which URL backed this answer?