How AI Search Works
1
Attention Is All You Need: The Google Paper That Accidentally Ended Google's Search Monopoly.
On June 12, 2017, eight researchers at Google Brain uploaded a 15-page paper to arXiv. It accidentally made Google's own search business obsolete.
2
How Large Language Models Actually Work: Tokens, Context Windows, and Why Your Content Gets Ignored.
The answer to "how do I get cited by AI search?" is technical, not strategic — rooted in how large language models actually process information.
3
What Is RAG: the Only Thing That Matters for AI Search.
Once you see how Retrieval-Augmented Generation works, the rest of AI search visibility clicks into place. It's the mechanism that connects your content to every answer these systems produce.
4
The AI Search Race: ChatGPT, Gemini, Claude, Copilot, and Perplexity. A Technical Timeline.
The models, the milestones, and the architectural decisions that shaped every AI search product in market — a reference for anyone trying to make sense of where this came from.
5
6
How to Be Surfaced in AI Search: What the Platforms' Own Documentation Actually Says.
Most GEO content produces a list of tactics and implies they work across all platforms. This article goes directly to what each platform has actually published about how its own system works.
7
Why the Bing AI Performance Report Is the Most Underused Tool in GEO.
Microsoft's Bing AI Performance report is the only first-party citation dashboard from any AI search platform. Most practitioners aren't using it.
8
The GEO Documentation Gap: Why Three Platforms Haven't Told You How to Rank in Their Systems.
Google published a guide. Microsoft published measurement tooling. Anthropic, OpenAI, and Perplexity have published nothing equivalent. Understanding why is more useful than waiting.
9
Constitutional AI vs RLHF: Why the Alignment Method Affects What Gets Cited.
ChatGPT is aligned with RLHF. Claude is aligned with Constitutional AI. These aren't interchangeable — and the difference affects what each model treats as trustworthy and citable.
Writing  ·  Series: How AI Search Works  ·  Part 5 of 6

How Google AI Overviews and AI Mode Actually Work: Why Your SEO Rankings Don't Guarantee AI Visibility.

Ranking position 1 in traditional Google doesn't mean you appear in the AI Overview for that query. The two systems use different retrieval mechanisms, different source selection criteria, and different notions of what earns a citation. This article explains the architecture behind that gap.

I've been watching this shift since the Search Generative Experience days. The default assumption still seems to be that AI Overviews are a bonus ranking factor, layered on top of existing positions. They aren't. In May 2024, Google restructured the way most people receive information: AI Overviews launched in US Search, and for the queries that trigger them, users see a synthesised answer before any organic results. For many queries, that answer is sufficient. The blue links are supplementary.

In May 2025, Google went further. AI Mode launched as a mode where the entire search experience is AI-generated, drawing on a multi-stage retrieval architecture that's fundamentally different from how traditional Search works. The teams most at risk are those who assume their existing SEO performance transfers directly to AI search visibility. It doesn't.

What AI Overviews are (and are not).

AI Overviews aren't a new ranking layer on top of traditional Search. They're a different retrieval and generation system that happens to draw from the same underlying index as traditional Search.

When a query triggers an AI Overview, Google's systems perform a retrieval operation: relevant passages are fetched from the Search index, injected into Gemini's context window alongside the query, and Gemini generates a synthesised response. That response is the AI Overview. The sources cited below it are the passages that were retrieved and used.

Google confirmed this explicitly in its May 2026 AI search documentation: AI Overviews use Retrieval-Augmented Generation (RAG), which Google calls "grounding." The retrieval source is Google's Search index, not a separate AI index.

The baseline requirements for AI Overview eligibility are identical to what you already need for traditional Search: the page must be indexed, crawlable, and eligible to appear with a snippet. Technical barriers (blocked crawling, noindex directives on canonical pages, JavaScript-only rendering of key content) prevent entry into the pipeline entirely. But eligibility is a floor, not a ceiling. Ranking in the top 10 doesn't guarantee citation in AI Overviews, and ranking outside the top 10 doesn't preclude it. The retrieval mechanism selects passages based on semantic similarity to the query, with standard ranking signals as one input among several, not as a strict gate.

Passage-level retrieval: the key difference.

Traditional SEO optimises at the page level. The unit of relevance is a URL. You build a page, you optimise that page's signals, and the page either ranks or it doesn't.

AI Overview retrieval operates at the passage level. The unit of relevance is a paragraph, a section, or sometimes a single sentence. Individual passages from a page are evaluated independently: the passage's semantic relevance to the query, the clarity of its claim, the specificity of its information.

A page can be authoritative at the domain level, technically healthy, and well-linked, and still produce passages that are too vague, too hedged, or too poorly structured to be retrieved and cited. Conversely, a page with modest domain authority can produce a passage that precisely and clearly answers a query's intent, and that passage may be cited while stronger-authority pages are not.

The practical implication: your granularity of optimisation needs to shift from the page to the paragraph. Every paragraph should be a coherent, self-contained unit that makes a specific, attributable claim. Preamble ("in this section we will discuss...") consumes tokens without providing retrievable information. Key claims buried after several sentences of context are less likely to survive passage-level retrieval intact.

AI Mode and query fan-out: multi-stage RAG.

AI Mode goes substantially further than AI Overviews. Where AI Overviews perform a single RAG retrieval pass, AI Mode uses a technique Google calls query fan-out: a multi-stage retrieval architecture that's the most significant structural change to Search since BERT in 2019.

Google's official description: query fan-out is "an information retrieval technique that expands a single user query into multiple sub-queries to capture different possible user intents, retrieving more diverse, broader results from different sources, including the live web, knowledge graph, and specialised data like Google Shopping."

Here's how it works in practice. You submit a query to AI Mode. Rather than performing a single retrieval operation against that query, the system uses an LLM to decompose it into multiple alternative sub-queries, each targeting a different facet of the original intent. Each sub-query retrieves independently. The results are collected and synthesised into a single response.

The practical consequence is that AI Mode's retrieval surface is far wider than traditional Search's top 10. A query that traditionally surfaces 10 results now generates multiple sub-queries, each with its own retrieval results. Content that doesn't rank in the top 10 for the primary query may rank for one of the sub-queries and get cited in the AI Mode response. This is confirmed empirically: only 14% of URLs cited by AI Mode rank in the traditional top 10 for the same query. The other 86% come from outside that top 10, retrieved via the fan-out sub-queries.

Google's patent: how query fan-out actually works.

Google's patent application US20240289407A1 provides technical documentation of the query fan-out mechanism. The patent describes a system that uses large language models to generate multiple alternate queries from the original search.

The process begins with "prompted expansion": an LLM receives structured instructions to create queries that explore different angles of the original intent. These expanded queries are sent to the retrieval system simultaneously or in parallel. The results are then collected and passed to a synthesis model that generates a unified response.

The sub-queries generated by fan-out aren't simply synonyms of the original. They explore different facets of the intent: related questions, narrower specifications, adjacent topics, different framings of the same underlying information need. A query about "how to improve website speed" might fan out to sub-queries about Core Web Vitals, server response time, image optimisation, caching strategy, and JavaScript performance, each retrieving independently from different sources.

This has a significant structural implication for content strategy: topical depth within a subject area creates multiple retrieval entry points. A site with comprehensive coverage of a topic (not just a single article, but interconnected content covering multiple facets) is more likely to be retrieved across multiple fan-out sub-queries than a site with a single strong piece on the primary query.

Deep Search: the agentic endpoint.

At the far end of the AI Mode architecture is Deep Search: a variant where Google's systems can issue dozens or even hundreds of background queries and may take several minutes to complete. Where AI Mode fan-out is measured in a handful to a dozen sub-queries, Deep Search is effectively unbounded multi-stage RAG.

Deep Search is triggered for queries that Google's systems determine require deep reasoning: complex research questions, multi-faceted comparative analysis, questions where the answer requires synthesising information from many disparate sources.

For content appearing in Deep Search results, the retrieval threshold differs from AI Overviews. A piece that directly addresses a specific sub-topic (even a niche one that isn't broadly linked) can be retrieved and cited if it's the best available source for that specific facet of the query.

How Perplexity and Copilot compare.

If you're looking beyond Google, the architectures of Perplexity and Microsoft Copilot offer useful contrast.

Perplexity performs real-time RAG on every single query via its Sonar model. There's no pre-existing ranked index that Perplexity retrieves from: retrieval happens live at query time against the public web. Traditional link-based SEO signals have no structural advantage in Perplexity. What matters is crawlability, content freshness, and factual authority. A page published and indexed today can be cited by Perplexity today. A page with years of accumulated links but outdated content may not be.

Microsoft Copilot uses a dual-layer retrieval architecture. For web-grounded responses, Copilot generates targeted search queries from your prompt (not the prompt itself, but a distilled set of terms the system determines will retrieve relevant information) and sends those queries to Bing. The retrieval is from Bing's index, which means Bing indexation is the baseline requirement for Copilot web citation. Content that's not in Bing's index can't be cited in Copilot web-grounded responses regardless of Google ranking.

What Google's own documentation says works, and what explicitly does not.

Google's May 2026 AI search documentation is the most authoritative first-party statement available on AI search optimisation.

What the documentation says works: foundational SEO first. Technical health, crawlability, snippet eligibility, and E-E-A-T are the baseline requirements. These are the same signals that have governed Google Search quality for years.

Non-commodity content is the next requirement. Google's language is specific: "content that provides value beyond common knowledge." If the information in a piece could be generated by an AI model without consulting the page, it's commodity content. Commodity content provides no grounding value to a RAG system that can generate equivalent content from training data.

Extractable structure matters too. Content must be structured clearly enough that an AI system can lift and cite specific claims: clear headings, unambiguous paragraph structure, front-loaded key claims, attributable data points.

What the documentation says explicitly does not work:

The sharpest line in the documentation: if your content could be generated by an AI without consulting your site, it probably won't be cited by one.

The decoupling of AI visibility from organic rankings.

The central practical implication of everything in this article is that AI search visibility and traditional organic search ranking are substantially decoupled.

They share a foundation (Google's Search index, baseline quality signals, technical health requirements) but the retrieval mechanisms above that foundation operate differently. Passage-level retrieval rather than page-level ranking. Semantic similarity as the primary retrieval signal rather than link authority. Query fan-out creating multiple retrieval entry points rather than a single SERP to rank within.

This decoupling creates both a threat and an opportunity. The threat: strong traditional rankings don't guarantee AI visibility. The opportunity: sites with lower traditional authority can achieve AI visibility if their content is precisely and clearly responsive to specific query intents, well-structured for passage retrieval, and factually authoritative in ways the model's training data can't replicate.

If you understand the architecture, not just the tactics, you can navigate the changes. And they will come.

/ Next in the series

Article 6 goes directly to what each platform has published about how its own system works: Google, Microsoft, Anthropic, OpenAI, and Perplexity. First-party sources only. Read Article 6 →

SOURCES
  1. Google Search Central, "Optimizing for Generative AI Features on Google Search," 2026 — developers.google.com/search/docs/...
  2. Google, "How Google AI Visual Search Works" (Query Fan-Out), 2026 — blog.google/...
  3. Google Cloud, "RAG and Grounding on Vertex AI," 2024 — cloud.google.com/blog/...
  4. Google Cloud, "What is Retrieval-Augmented Generation?" — cloud.google.com/use-cases/retrieval-augmented-generation
  5. Google, "How Search Works" — google.com/search/howsearchworks
  6. Semrush, "What Is Query Fan-Out?" — semrush.com/blog/query-fan-out
  7. Wikipedia: Google Gemini — en.wikipedia.org/wiki/Google_Gemini
  8. Wikipedia: Retrieval-augmented generation — en.wikipedia.org/wiki/Retrieval-augmented_generation
  9. Microsoft Learn, "Use Public Websites to Improve Generative Answers" — learn.microsoft.com/...
  10. Perplexity, "Sonar Models Documentation" — docs.perplexity.ai/docs/sonar/models
  11. Wikipedia: Microsoft Copilot — en.wikipedia.org/wiki/Microsoft_Copilot
  12. Wikipedia: Perplexity AI — en.wikipedia.org/wiki/Perplexity_AI
How AI Search Works

The full series.

1
Available now
Attention Is All You Need: The Google Paper That Accidentally Ended Google's Search Monopoly
2
Available now
How Large Language Models Actually Work: Tokens, Context Windows, and Why Your Content Gets Ignored
3
Available now
What Is RAG: the Only Thing That Matters for AI Search
4
Available now
The AI Search Race: ChatGPT, Gemini, Claude, Copilot, and Perplexity. A Technical Timeline
5
You are here
How Google AI Overviews and AI Mode Actually Work: Why Your SEO Rankings Don't Guarantee AI Visibility
6
Available now
How to Be Surfaced in AI Search: What the Providers' Own Documentation Actually Says
tc
/ Written by

Thomas Cox

Twelve years in B2B SEO, most recently at VP level. Now independent — helping companies stay discoverable as buyer search moves into ChatGPT, Perplexity, and Google AI Overviews. Remote · UK.

Want this in your inbox?

Roughly one essay a month. No drip sequences, no upsells.

Subscribe