How AI Search Works
1
Attention Is All You Need: The Google Paper That Accidentally Ended Google's Search Monopoly.
On June 12, 2017, eight researchers at Google Brain published a 15-page paper. It accidentally made Google's own search business obsolete.
2
How Large Language Models Actually Work: Tokens, Context Windows, and Why Your Content Gets Ignored.
The answer to "how do I get cited by AI search?" is technical, not strategic. It's rooted in how large language models actually process information.
3
4
The AI Search Race: ChatGPT, Gemini, Claude, Copilot, and Perplexity. A Technical Timeline.
The models, the milestones, and the architectural decisions that shaped every AI search product in market. A reference for anyone trying to make sense of where this came from.
5
How Google AI Overviews and AI Mode Actually Work: Why Your SEO Rankings Don't Guarantee AI Visibility.
AI Overviews aren't a ranking bonus layered on top of existing positions. In May 2024, Google restructured how most people receive information. The blue links became supplementary.
6
How to Be Surfaced in AI Search: What the Platforms' Own Documentation Actually Says.
Most GEO content produces a list of tactics and implies they work across all platforms. This article goes directly to what each platform has actually published about how its own system works.
7
Why the Bing AI Performance Report Is the Most Underused Tool in GEO.
Microsoft's Bing AI Performance report is the only first-party citation dashboard from any AI search platform. Most practitioners aren't using it.
8
The GEO Documentation Gap: Why Three Platforms Haven't Told You How to Rank in Their Systems.
Google published a guide. Microsoft published measurement tooling. Anthropic, OpenAI, and Perplexity have published nothing equivalent. Understanding why is more useful than waiting.
9
Constitutional AI vs RLHF: Why the Alignment Method Affects What Gets Cited.
ChatGPT is aligned with RLHF. Claude is aligned with Constitutional AI. These aren't interchangeable, and the difference affects what each model treats as trustworthy and citable.
10
Training Data Is the New Ranking Factor: The Common Crawl AI Visibility Audit Explained.
Before RAG, before indexing, a site has to be reachable by the crawlers that feed AI training data. The Common Crawl AI Visibility Audit is the framework for checking that upstream layer.
Writing  ·  Series: How AI Search Works  ·  Part 3 of 10

What Is RAG: the Only Thing That Matters for AI Search.

Retrieval-Augmented Generation is the architecture that makes Perplexity, Google AI Overviews, and Bing Copilot work. Understanding how it fetches and uses live content is the key to understanding why some pages get cited and others don't.

RAG (retrieval-augmented generation) is how AI search reads the live web: the model retrieves relevant documents at query time and uses them to write its answer, instead of relying only on what it learned in training. It's the mechanism behind RAG search in Perplexity, Google AI Overviews, and Bing Copilot, and it's what decides whether your content gets cited or ignored.

RAG is the topic I get asked about more than any other in this series. And I understand why: once you see how Retrieval-Augmented Generation works, the rest of AI search visibility clicks into place. It's the mechanism that connects your content to every answer these systems produce.

Every trained language model has a structural problem: it doesn't know what happened after its training ended. Ask GPT-4 about an event that occurred after its knowledge cutoff and it'll either decline to answer or (more problematically) generate a confident, plausible-sounding response with no basis in reality. This isn't a failure of intelligence. Training fixes the model's weights, and fixed weights mean fixed knowledge. No amount of prompting can make a model know something it wasn't trained on. RAG is the architectural solution. Every major AI search product uses it. If you want to understand why your content gets cited or ignored, this is where you start.

The problem RAG solves.

Before RAG, AI systems faced a binary choice: either the model knew something from training, or it didn't. For static knowledge (scientific principles, historical facts, stable reference material) training data was sufficient. For anything that changes (recent events, current pricing, new research, live market data) it wasn't.

The response without RAG is a "closed book" inference: the model answers purely from its learned parameters. The response with RAG is an "open book" inference: the model has access to retrieved documents it can read and cite before answering. Think of it as the difference between a closed-book exam and an open-book exam.

For AI search specifically, RAG solves three distinct problems.

The four-step RAG pipeline.

RAG systems typically follow a four-step pipeline. Each step is where different content signals matter, and understanding them is what separates a content strategy that works from one that doesn't.

1
Chunking & indexing
The system splits your content into passages and converts each into a vector embedding (a numerical representation of meaning), stored in a vector database before any queries arrive.
2
Retrieval
At query time, the query is converted to an embedding. A similarity search finds stored passages whose meaning is closest to the query's intent: not keyword matches, semantic similarity.
3
Prompt augmentation
The retrieved passages are injected into the LLM's context window alongside the original query. The model now has real-world content to ground its response against.
4
Generation
The model generates a response grounded in retrieved passages. Citations are extracted from the passages that contributed: the links under AI Overviews, in Perplexity, in ChatGPT Search.
Vector embeddingA high-dimensional numerical representation of a text passage's semantic meaning. Similar meanings produce similar vectors. Cosine similarity between vectors is the basis for RAG retrieval, not keyword matching.

The chunking step is where content structure has its first significant effect. A well-structured document (clear headings, self-contained paragraphs, each passage making a coherent claim) chunks into retrievable, usable units. A document with long paragraphs that mix multiple ideas, or key claims buried after preamble, produces chunks that are harder to retrieve accurately and harder to use once retrieved.

This is the critical technical distinction from traditional search. Traditional search ranking is driven by keyword matching (does the page contain the query terms?), link authority (how many authoritative sites link to this page?), and behavioural signals (do users click and stay?). RAG retrieval is driven by semantic similarity: how close is the meaning of this passage to the meaning of this query, in a high-dimensional vector space?

These are genuinely different things. A page that ranks well in traditional search because it's accumulated links over years may not retrieve well in RAG if its content isn't semantically precise about what you're asking. A newer page with lower link authority may retrieve well if it directly and precisely addresses the query's intent.

Google's term for RAG: grounding.

Google officially confirmed in its May 2026 AI search documentation that AI Overviews and AI Mode both use Retrieval-Augmented Generation, which Google calls "grounding."

Grounding, in Google's usage, means anchoring an AI response to retrieved real-world sources rather than relying on training data alone. The retrieval source for Google's AI features is Google's own Search index: the same index used for traditional blue-link results.

This has a direct and important implication: there's no separate AI index to optimise for. Getting into AI Overviews starts with being indexed and crawlable in Google's standard Search index. All the baseline technical requirements of traditional SEO (indexation, crawlability, snippet eligibility, technical health) still apply before any AI-specific considerations. Yes, even now.

Google's documentation is explicit on this point: foundational SEO is the prerequisite for AI search visibility, not an alternative to it.

How each platform implements RAG differently.

The four-step pipeline is universal. The implementation varies significantly by platform, and those differences have direct implications for your GEO strategy.

Google AI Overviews
Single-pass RAG
Retrieves passages from the standard Search index using ranking signals as a first-pass filter, then injects into Gemini. Passage-level: individual paragraphs can be cited independently of overall page ranking.
Google AI Mode
/ Multi-stage RAG + query fan-out
A single query is decomposed into multiple sub-queries, each retrieving independently. Only 14% of cited URLs rank in the traditional top 10. The other 86% come via fan-out sub-queries.
Microsoft Copilot
/ Dual-layer retrieval
For web queries: generates distilled search terms, sends to Bing, retrieves, performs grounding checks, then synthesises. The query Copilot sends to Bing is not your original prompt.
ChatGPT Search
/ Hybrid retrieval
Rewrites queries into targeted sub-queries sent to search partners. Deep Research mode is fully agentic: it plans and executes a multi-step research process, refining queries iteratively.
Perplexity
/ Real-time RAG
No pre-existing ranked index. Retrieval happens live at query time against the public web via the Sonar model. Link authority has no structural advantage: crawlability, freshness, and factual precision do.
Claude
Web search when enabled
Generates targeted search queries, retrieves results, analyses for key information, provides cited responses. Can conduct multiple progressive searches. Without web search: training data only.

What semantic similarity means for your content.

These are genuinely different things. A page that ranks well in traditional search because it's accumulated links over years may not retrieve well in RAG if its content isn't semantically precise about what you're asking.

In traditional keyword-based retrieval, the question was: does this document contain the query terms? Ranking then determined which keyword-matching documents to surface first. SEO accordingly optimised for keyword presence, keyword density, and ranking signals.

In semantic vector retrieval, the question is different: is the meaning of this passage similar to the meaning of this query? The passage doesn't need to contain the exact query terms. It needs to express ideas that are semantically close to the query's intent.

The practical consequences are real. A passage that answers "how do transformers handle long sequences?" will be retrieved for queries about "Transformer attention mechanism long context" even if it uses different phrasing. A passage stuffed with exact-match keywords that doesn't actually, clearly address the underlying question retrieves poorly despite keyword density. Write for the question the reader is trying to answer, not for the keywords you want to rank for. In a semantic retrieval world, those two things converge more than in a keyword world, but the framing still matters.

The universal requirements.

The RAG architecture drives the same content requirements regardless of which platform you're targeting.

Crawlability. If the platform's retrieval system can't access your content, it can't retrieve it. Technical barriers to crawling (blocked resources, JavaScript-only rendering, noindex directives on canonical pages) prevent entry into the pipeline entirely.

Passage-level structure. Because retrieval operates at the passage level, not the page level, each passage should be a coherent, self-contained unit that makes a clear claim.

Semantic precision. Your content needs to precisely address the questions it's intended to answer. Vague, hedged, or densely jargon-laden content retrieves poorly.

Factual attribution. Well-designed RAG systems perform grounding checks. Content that makes unattributed assertions is harder to use as a citation anchor than content with clear, attributable claims.

Non-commodity value. If the model's training data already contains a reliable answer to a query, the RAG system has less incentive to retrieve external content. Content that provides information the model can't generate from training data alone (proprietary data, original research, expert analysis, first-hand experience) is structurally more valuable to RAG systems.

Article 6 in this series covers what each platform has actually published about these requirements where the guidance exists, and what the architecture implies where it doesn't. The requirements above aren't speculation. They follow directly from the pipeline I've described here.

The universal requirements
Five things that apply regardless of which platform you're targeting.
  • Crawlability: if the platform's retrieval system can't access your content, it can't retrieve it
  • Passage-level structure: each paragraph should be a coherent, self-contained unit making a clear claim
  • Semantic precision: your content needs to directly and precisely address the questions it's intended to answer
  • Factual attribution: content with clear, attributable claims is easier for RAG systems to use as a citation anchor
  • Non-commodity value: content the model can generate from training data alone provides no grounding incentive

Frequently asked

What is RAG search?

RAG search is AI search that retrieves live documents at query time and uses them to write its answer, rather than relying only on the model's training. RAG stands for retrieval-augmented generation. It's what lets Perplexity, Google AI Overviews, and Bing Copilot cite current web pages in their responses.

How does RAG work in AI search?

In three steps: the system retrieves documents relevant to the query, passes them to the model as context, and the model generates an answer grounded in those documents (usually with citations). It's the difference between a closed-book exam and an open-book one.

Which AI platforms use RAG?

All the major AI search products. Perplexity runs RAG on nearly every query, Google AI Overviews and AI Mode use it for grounding, and Bing Copilot retrieves from Bing's index. The implementations differ, but the retrieve-then-generate pattern is shared.

What's the difference between RAG and training data?

Training data is what the model learned before it was deployed: fixed knowledge with a cutoff date. RAG is live retrieval at query time. A page can be cited via RAG the day it's published, long before it could ever influence a future training run. The two are separate routes to being used by AI.

How do I get my content retrieved by RAG?

Be crawlable, be current, and be clearly the best answer to a specific question. RAG retrieves on semantic relevance and freshness, not link authority, so structurally clear, genuinely informative content tends to win over keyword-optimised pages.

Does RAG replace SEO?

No, it sits alongside it. The technical foundations overlap heavily (crawlability, clear structure, authority), but RAG retrieval rewards different signals than blue-link ranking. You optimise for both, and measure AI visibility with citation share rather than position alone.

Next in the series

Article 4 is the definitive technical timeline of the AI search race: every significant model release and architectural shift from November 2022 to May 2026. Read Article 4 →

SOURCES
  1. Google Cloud, "What is Retrieval-Augmented Generation?" - cloud.google.com/use-cases/retrieval-augmented-generation
  2. Google Cloud, "RAG and Grounding on Vertex AI," 2024 - cloud.google.com/blog/...
  3. Google Search Central, "Optimizing for Generative AI Features on Google Search," 2026 - developers.google.com/search/docs/...
  4. IBM, "What is Retrieval-Augmented Generation?" - ibm.com/think/topics/retrieval-augmented-generation
  5. AWS, "What is Retrieval-Augmented Generation?" - aws.amazon.com/what-is/retrieval-augmented-generation
  6. Anthropic, "Introducing Web Search on the Anthropic API," 2025 - anthropic.com/news/web-search-api
  7. Anthropic, "Introducing Citations on the Anthropic API," 2025 - anthropic.com/news/introducing-citations-api
  8. OpenAI, "Introducing ChatGPT Search," 2024 - openai.com/index/introducing-chatgpt-search
  9. OpenAI API Documentation, "Web Search" - platform.openai.com/docs/guides/tools-web-search
  10. OpenAI Help Centre, "ChatGPT Search" - help.openai.com/en/articles/9237897-chatgpt-search
  11. Microsoft Learn, "Microsoft 365 Copilot Architecture" - learn.microsoft.com/en-us/microsoft-365/copilot/...
  12. Microsoft Learn, "Semantic Indexing for Microsoft 365 Copilot" - learn.microsoft.com/en-us/microsoftsearch/...
  13. Microsoft Learn, "Use Public Websites to Improve Generative Answers" - learn.microsoft.com/en-us/microsoft-copilot-studio/...
  14. Perplexity, "Sonar Models Documentation" - docs.perplexity.ai/docs/sonar/models
  15. Perplexity, "Meet New Sonar," 2025 - perplexity.ai/hub/blog/meet-new-sonar
  16. Wikipedia: Retrieval-augmented generation - en.wikipedia.org/wiki/Retrieval-augmented_generation
  17. Wikipedia: Large language model - en.wikipedia.org/wiki/Large_language_model