Writing · Series: How AI Search Works · Part 5 of 6
How Google AI Overviews and AI Mode Actually Work: Why Your SEO Rankings Don't Guarantee AI Visibility.
Ranking position 1 in traditional Google doesn't mean you appear in the AI Overview for that query. The two systems use different retrieval mechanisms, different source selection criteria, and different notions of what earns a citation. This article explains the architecture behind that gap.
Published1 June 2026
ByThomas Cox
Read time13 minutes
Filed underSeries · Google AI Search · Part 5 of 6
I've been watching this shift since the Search Generative Experience days. The default assumption still seems to be that AI Overviews are a bonus ranking factor, layered on top of existing positions. They aren't. In May 2024, Google restructured the way most people receive information: AI Overviews launched in US Search, and for the queries that trigger them, users see a synthesised answer before any organic results. For many queries, that answer is sufficient. The blue links are supplementary.
In May 2025, Google went further. AI Mode launched as a mode where the entire search experience is AI-generated, drawing on a multi-stage retrieval architecture that's fundamentally different from how traditional Search works. The teams most at risk are those who assume their existing SEO performance transfers directly to AI search visibility. It doesn't.
What AI Overviews are (and are not).
AI Overviews aren't a new ranking layer on top of traditional Search. They're a different retrieval and generation system that happens to draw from the same underlying index as traditional Search.
When a query triggers an AI Overview, Google's systems perform a retrieval operation: relevant passages are fetched from the Search index, injected into Gemini's context window alongside the query, and Gemini generates a synthesised response. That response is the AI Overview. The sources cited below it are the passages that were retrieved and used.
Google confirmed this explicitly in its May 2026 AI search documentation: AI Overviews use Retrieval-Augmented Generation (RAG), which Google calls "grounding." The retrieval source is Google's Search index, not a separate AI index.
The baseline requirements for AI Overview eligibility are identical to what you already need for traditional Search: the page must be indexed, crawlable, and eligible to appear with a snippet. Technical barriers (blocked crawling, noindex directives on canonical pages, JavaScript-only rendering of key content) prevent entry into the pipeline entirely. But eligibility is a floor, not a ceiling. Ranking in the top 10 doesn't guarantee citation in AI Overviews, and ranking outside the top 10 doesn't preclude it. The retrieval mechanism selects passages based on semantic similarity to the query, with standard ranking signals as one input among several, not as a strict gate.
Passage-level retrieval: the key difference.
Traditional SEO optimises at the page level. The unit of relevance is a URL. You build a page, you optimise that page's signals, and the page either ranks or it doesn't.
AI Overview retrieval operates at the passage level. The unit of relevance is a paragraph, a section, or sometimes a single sentence. Individual passages from a page are evaluated independently: the passage's semantic relevance to the query, the clarity of its claim, the specificity of its information.
A page can be authoritative at the domain level, technically healthy, and well-linked, and still produce passages that are too vague, too hedged, or too poorly structured to be retrieved and cited. Conversely, a page with modest domain authority can produce a passage that precisely and clearly answers a query's intent, and that passage may be cited while stronger-authority pages are not.
The practical implication: your granularity of optimisation needs to shift from the page to the paragraph. Every paragraph should be a coherent, self-contained unit that makes a specific, attributable claim. Preamble ("in this section we will discuss...") consumes tokens without providing retrievable information. Key claims buried after several sentences of context are less likely to survive passage-level retrieval intact.
AI Mode and query fan-out: multi-stage RAG.
AI Mode goes substantially further than AI Overviews. Where AI Overviews perform a single RAG retrieval pass, AI Mode uses a technique Google calls query fan-out: a multi-stage retrieval architecture that's the most significant structural change to Search since BERT in 2019.
Google's official description: query fan-out is "an information retrieval technique that expands a single user query into multiple sub-queries to capture different possible user intents, retrieving more diverse, broader results from different sources, including the live web, knowledge graph, and specialised data like Google Shopping."
Here's how it works in practice. You submit a query to AI Mode. Rather than performing a single retrieval operation against that query, the system uses an LLM to decompose it into multiple alternative sub-queries, each targeting a different facet of the original intent. Each sub-query retrieves independently. The results are collected and synthesised into a single response.
The practical consequence is that AI Mode's retrieval surface is far wider than traditional Search's top 10. A query that traditionally surfaces 10 results now generates multiple sub-queries, each with its own retrieval results. Content that doesn't rank in the top 10 for the primary query may rank for one of the sub-queries and get cited in the AI Mode response. This is confirmed empirically: only 14% of URLs cited by AI Mode rank in the traditional top 10 for the same query. The other 86% come from outside that top 10, retrieved via the fan-out sub-queries.
Google's patent: how query fan-out actually works.
Google's patent application US20240289407A1 provides technical documentation of the query fan-out mechanism. The patent describes a system that uses large language models to generate multiple alternate queries from the original search.
The process begins with "prompted expansion": an LLM receives structured instructions to create queries that explore different angles of the original intent. These expanded queries are sent to the retrieval system simultaneously or in parallel. The results are then collected and passed to a synthesis model that generates a unified response.
The sub-queries generated by fan-out aren't simply synonyms of the original. They explore different facets of the intent: related questions, narrower specifications, adjacent topics, different framings of the same underlying information need. A query about "how to improve website speed" might fan out to sub-queries about Core Web Vitals, server response time, image optimisation, caching strategy, and JavaScript performance, each retrieving independently from different sources.
This has a significant structural implication for content strategy: topical depth within a subject area creates multiple retrieval entry points. A site with comprehensive coverage of a topic (not just a single article, but interconnected content covering multiple facets) is more likely to be retrieved across multiple fan-out sub-queries than a site with a single strong piece on the primary query.
Deep Search: the agentic endpoint.
At the far end of the AI Mode architecture is Deep Search: a variant where Google's systems can issue dozens or even hundreds of background queries and may take several minutes to complete. Where AI Mode fan-out is measured in a handful to a dozen sub-queries, Deep Search is effectively unbounded multi-stage RAG.
Deep Search is triggered for queries that Google's systems determine require deep reasoning: complex research questions, multi-faceted comparative analysis, questions where the answer requires synthesising information from many disparate sources.
For content appearing in Deep Search results, the retrieval threshold differs from AI Overviews. A piece that directly addresses a specific sub-topic (even a niche one that isn't broadly linked) can be retrieved and cited if it's the best available source for that specific facet of the query.
How Perplexity and Copilot compare.
If you're looking beyond Google, the architectures of Perplexity and Microsoft Copilot offer useful contrast.
Perplexity performs real-time RAG on every single query via its Sonar model. There's no pre-existing ranked index that Perplexity retrieves from: retrieval happens live at query time against the public web. Traditional link-based SEO signals have no structural advantage in Perplexity. What matters is crawlability, content freshness, and factual authority. A page published and indexed today can be cited by Perplexity today. A page with years of accumulated links but outdated content may not be.
Microsoft Copilot uses a dual-layer retrieval architecture. For web-grounded responses, Copilot generates targeted search queries from your prompt (not the prompt itself, but a distilled set of terms the system determines will retrieve relevant information) and sends those queries to Bing. The retrieval is from Bing's index, which means Bing indexation is the baseline requirement for Copilot web citation. Content that's not in Bing's index can't be cited in Copilot web-grounded responses regardless of Google ranking.
What Google's own documentation says works, and what explicitly does not.
Google's May 2026 AI search documentation is the most authoritative first-party statement available on AI search optimisation.
What the documentation says works: foundational SEO first. Technical health, crawlability, snippet eligibility, and E-E-A-T are the baseline requirements. These are the same signals that have governed Google Search quality for years.
Non-commodity content is the next requirement. Google's language is specific: "content that provides value beyond common knowledge." If the information in a piece could be generated by an AI model without consulting the page, it's commodity content. Commodity content provides no grounding value to a RAG system that can generate equivalent content from training data.
Extractable structure matters too. Content must be structured clearly enough that an AI system can lift and cite specific claims: clear headings, unambiguous paragraph structure, front-loaded key claims, attributable data points.
What the documentation says explicitly does not work:
- llms.txt files (no evidence they affect AI search visibility)
- Content "chunking" specifically for AI (the model does its own chunking; manual chunking isn't a meaningful signal)
- AI-specific phrasing rewrites (optimising language to sound more "AI-friendly" isn't a signal)
- Inauthentic mention campaigns (coordinated efforts to get a brand mentioned across AI-generated or low-quality content)
The sharpest line in the documentation: if your content could be generated by an AI without consulting your site, it probably won't be cited by one.
The decoupling of AI visibility from organic rankings.
The central practical implication of everything in this article is that AI search visibility and traditional organic search ranking are substantially decoupled.
They share a foundation (Google's Search index, baseline quality signals, technical health requirements) but the retrieval mechanisms above that foundation operate differently. Passage-level retrieval rather than page-level ranking. Semantic similarity as the primary retrieval signal rather than link authority. Query fan-out creating multiple retrieval entry points rather than a single SERP to rank within.
This decoupling creates both a threat and an opportunity. The threat: strong traditional rankings don't guarantee AI visibility. The opportunity: sites with lower traditional authority can achieve AI visibility if their content is precisely and clearly responsive to specific query intents, well-structured for passage retrieval, and factually authoritative in ways the model's training data can't replicate.
If you understand the architecture, not just the tactics, you can navigate the changes. And they will come.
/ Next in the series
Article 6 goes directly to what each platform has published about how its own system works: Google, Microsoft, Anthropic, OpenAI, and Perplexity. First-party sources only. Read Article 6 →