Writing · Series: How AI Search Works · Part 3 of 10
What Is RAG: the Only Thing That Matters for AI Search.
Retrieval-Augmented Generation is the architecture that makes Perplexity, Google AI Overviews, and Bing Copilot work. Understanding how it fetches and uses live content is the key to understanding why some pages get cited and others don't.
Published1 June 2026
Read time11 minutes
Filed underSeries · RAG · Part 3 of 10
RAG (retrieval-augmented generation) is how AI search reads the live web: the model retrieves relevant documents at query time and uses them to write its answer, instead of relying only on what it learned in training. It's the mechanism behind RAG search in Perplexity, Google AI Overviews, and Bing Copilot, and it's what decides whether your content gets cited or ignored.
RAG is the topic I get asked about more than any other in this series. And I understand why: once you see how Retrieval-Augmented Generation works, the rest of AI search visibility clicks into place. It's the mechanism that connects your content to every answer these systems produce.
Every trained language model has a structural problem: it doesn't know what happened after its training ended. Ask GPT-4 about an event that occurred after its knowledge cutoff and it'll either decline to answer or (more problematically) generate a confident, plausible-sounding response with no basis in reality. This isn't a failure of intelligence. Training fixes the model's weights, and fixed weights mean fixed knowledge. No amount of prompting can make a model know something it wasn't trained on. RAG is the architectural solution. Every major AI search product uses it. If you want to understand why your content gets cited or ignored, this is where you start.
Before RAG, AI systems faced a binary choice: either the model knew something from training, or it didn't. For static knowledge (scientific principles, historical facts, stable reference material) training data was sufficient. For anything that changes (recent events, current pricing, new research, live market data) it wasn't.
The response without RAG is a "closed book" inference: the model answers purely from its learned parameters. The response with RAG is an "open book" inference: the model has access to retrieved documents it can read and cite before answering. Think of it as the difference between a closed-book exam and an open-book exam.
For AI search specifically, RAG solves three distinct problems.
- Recency. Training data has a cutoff. RAG retrieves current content at query time, so responses can reflect information published after training ended.
- Hallucination. By grounding responses in retrieved documents, RAG gives the model factual anchors that constrain generation toward accurate claims. A model that hallucinates a fact when answering from memory is far less likely to hallucinate when it has a retrieved document containing the correct fact in front of it.
- Attribution. RAG-grounded responses can cite their sources in a way that closed-book responses can't. This is why AI search products display citations alongside answers.
The four-step RAG pipeline.
RAG systems typically follow a four-step pipeline. Each step is where different content signals matter, and understanding them is what separates a content strategy that works from one that doesn't.
1
Chunking & indexing
The system splits your content into passages and converts each into a vector embedding (a numerical representation of meaning), stored in a vector database before any queries arrive.
→
2
Retrieval
At query time, the query is converted to an embedding. A similarity search finds stored passages whose meaning is closest to the query's intent: not keyword matches, semantic similarity.
→
3
Prompt augmentation
The retrieved passages are injected into the LLM's context window alongside the original query. The model now has real-world content to ground its response against.
→
4
Generation
The model generates a response grounded in retrieved passages. Citations are extracted from the passages that contributed: the links under AI Overviews, in Perplexity, in ChatGPT Search.
→
Vector embeddingA high-dimensional numerical representation of a text passage's semantic meaning. Similar meanings produce similar vectors. Cosine similarity between vectors is the basis for RAG retrieval, not keyword matching.
The chunking step is where content structure has its first significant effect. A well-structured document (clear headings, self-contained paragraphs, each passage making a coherent claim) chunks into retrievable, usable units. A document with long paragraphs that mix multiple ideas, or key claims buried after preamble, produces chunks that are harder to retrieve accurately and harder to use once retrieved.
This is the critical technical distinction from traditional search. Traditional search ranking is driven by keyword matching (does the page contain the query terms?), link authority (how many authoritative sites link to this page?), and behavioural signals (do users click and stay?). RAG retrieval is driven by semantic similarity: how close is the meaning of this passage to the meaning of this query, in a high-dimensional vector space?
These are genuinely different things. A page that ranks well in traditional search because it's accumulated links over years may not retrieve well in RAG if its content isn't semantically precise about what you're asking. A newer page with lower link authority may retrieve well if it directly and precisely addresses the query's intent.
Google's term for RAG: grounding.
Google officially confirmed in its May 2026 AI search documentation that AI Overviews and AI Mode both use Retrieval-Augmented Generation, which Google calls "grounding."
Grounding, in Google's usage, means anchoring an AI response to retrieved real-world sources rather than relying on training data alone. The retrieval source for Google's AI features is Google's own Search index: the same index used for traditional blue-link results.
This has a direct and important implication: there's no separate AI index to optimise for. Getting into AI Overviews starts with being indexed and crawlable in Google's standard Search index. All the baseline technical requirements of traditional SEO (indexation, crawlability, snippet eligibility, technical health) still apply before any AI-specific considerations. Yes, even now.
Google's documentation is explicit on this point: foundational SEO is the prerequisite for AI search visibility, not an alternative to it.
How each platform implements RAG differently.
The four-step pipeline is universal. The implementation varies significantly by platform, and those differences have direct implications for your GEO strategy.
What semantic similarity means for your content.
These are genuinely different things. A page that ranks well in traditional search because it's accumulated links over years may not retrieve well in RAG if its content isn't semantically precise about what you're asking.
In traditional keyword-based retrieval, the question was: does this document contain the query terms? Ranking then determined which keyword-matching documents to surface first. SEO accordingly optimised for keyword presence, keyword density, and ranking signals.
In semantic vector retrieval, the question is different: is the meaning of this passage similar to the meaning of this query? The passage doesn't need to contain the exact query terms. It needs to express ideas that are semantically close to the query's intent.
The practical consequences are real. A passage that answers "how do transformers handle long sequences?" will be retrieved for queries about "Transformer attention mechanism long context" even if it uses different phrasing. A passage stuffed with exact-match keywords that doesn't actually, clearly address the underlying question retrieves poorly despite keyword density. Write for the question the reader is trying to answer, not for the keywords you want to rank for. In a semantic retrieval world, those two things converge more than in a keyword world, but the framing still matters.
The universal requirements.
The RAG architecture drives the same content requirements regardless of which platform you're targeting.
Crawlability. If the platform's retrieval system can't access your content, it can't retrieve it. Technical barriers to crawling (blocked resources, JavaScript-only rendering, noindex directives on canonical pages) prevent entry into the pipeline entirely.
Passage-level structure. Because retrieval operates at the passage level, not the page level, each passage should be a coherent, self-contained unit that makes a clear claim.
Semantic precision. Your content needs to precisely address the questions it's intended to answer. Vague, hedged, or densely jargon-laden content retrieves poorly.
Factual attribution. Well-designed RAG systems perform grounding checks. Content that makes unattributed assertions is harder to use as a citation anchor than content with clear, attributable claims.
Non-commodity value. If the model's training data already contains a reliable answer to a query, the RAG system has less incentive to retrieve external content. Content that provides information the model can't generate from training data alone (proprietary data, original research, expert analysis, first-hand experience) is structurally more valuable to RAG systems.
Article 6 in this series covers what each platform has actually published about these requirements where the guidance exists, and what the architecture implies where it doesn't. The requirements above aren't speculation. They follow directly from the pipeline I've described here.
The universal requirements
Five things that apply regardless of which platform you're targeting.
- Crawlability: if the platform's retrieval system can't access your content, it can't retrieve it
- Passage-level structure: each paragraph should be a coherent, self-contained unit making a clear claim
- Semantic precision: your content needs to directly and precisely address the questions it's intended to answer
- Factual attribution: content with clear, attributable claims is easier for RAG systems to use as a citation anchor
- Non-commodity value: content the model can generate from training data alone provides no grounding incentive
Frequently asked
What is RAG search?
RAG search is AI search that retrieves live documents at query time and uses them to write its answer, rather than relying only on the model's training. RAG stands for retrieval-augmented generation. It's what lets Perplexity, Google AI Overviews, and Bing Copilot cite current web pages in their responses.
How does RAG work in AI search?
In three steps: the system retrieves documents relevant to the query, passes them to the model as context, and the model generates an answer grounded in those documents (usually with citations). It's the difference between a closed-book exam and an open-book one.
Which AI platforms use RAG?
All the major AI search products. Perplexity runs RAG on nearly every query, Google AI Overviews and AI Mode use it for grounding, and Bing Copilot retrieves from Bing's index. The implementations differ, but the retrieve-then-generate pattern is shared.
What's the difference between RAG and training data?
Training data is what the model learned before it was deployed: fixed knowledge with a cutoff date. RAG is live retrieval at query time. A page can be cited via RAG the day it's published, long before it could ever influence a future training run. The two are separate routes to being used by AI.
How do I get my content retrieved by RAG?
Be crawlable, be current, and be clearly the best answer to a specific question. RAG retrieves on semantic relevance and freshness, not link authority, so structurally clear, genuinely informative content tends to win over keyword-optimised pages.
Does RAG replace SEO?
No, it sits alongside it. The technical foundations overlap heavily (crawlability, clear structure, authority), but RAG retrieval rewards different signals than blue-link ranking. You optimise for both, and measure AI visibility with citation share rather than position alone.
Next in the series
Article 4 is the definitive technical timeline of the AI search race: every significant model release and architectural shift from November 2022 to May 2026. Read Article 4 →