How AI Search Works
1
Attention Is All You Need: The Google Paper That Accidentally Ended Google's Search Monopoly.
On June 12, 2017, eight researchers at Google Brain published a 15-page paper. It accidentally made Google's own search business obsolete.
2
How Large Language Models Actually Work: Tokens, Context Windows, and Why Your Content Gets Ignored.
The answer to "how do I get cited by AI search?" is technical, not strategic. It's rooted in how large language models actually process information.
3
What Is RAG: the Only Thing That Matters for AI Search.
Once you see how Retrieval-Augmented Generation works, the rest of AI search visibility clicks into place. It's the mechanism that connects your content to every answer these systems produce.
4
The AI Search Race: ChatGPT, Gemini, Claude, Copilot, and Perplexity. A Technical Timeline.
The models, the milestones, and the architectural decisions that shaped every AI search product in market. A reference for anyone trying to make sense of where this came from.
5
6
How to Be Surfaced in AI Search: What the Platforms' Own Documentation Actually Says.
Most GEO content produces a list of tactics and implies they work across all platforms. This article goes directly to what each platform has actually published about how its own system works.
7
Why the Bing AI Performance Report Is the Most Underused Tool in GEO.
Microsoft's Bing AI Performance report is the only first-party citation dashboard from any AI search platform. Most people working in GEO aren't using it.
8
The GEO Documentation Gap: Why Three Platforms Haven't Told You How to Rank in Their Systems.
Google published a guide. Microsoft published measurement tooling. Anthropic, OpenAI, and Perplexity have published nothing equivalent. Understanding why is more useful than waiting.
9
Constitutional AI vs RLHF: Why the Alignment Method Affects What Gets Cited.
ChatGPT is aligned with RLHF. Claude is aligned with Constitutional AI. These aren't interchangeable, and the difference affects what each model treats as trustworthy and citable.
10
Training Data Is the New Ranking Factor: The Common Crawl AI Visibility Audit Explained.
Before RAG, before indexing, a site has to be reachable by the crawlers that feed AI training data. The Common Crawl AI Visibility Audit is the framework for checking that upstream layer.
Writing  ·  Series: How AI Search Works  ·  Part 5 of 10

How Google AI Overviews and AI Mode Actually Work: Why Your SEO Rankings Don't Guarantee AI Visibility.

Ranking position 1 in traditional Google doesn't mean you appear in the AI Overview for that query. The two systems use different retrieval mechanisms, different source selection criteria, and different notions of what earns a citation. This article explains the architecture behind that gap.

I've been watching this shift since the Search Generative Experience days. The default assumption still seems to be that AI Overviews are a bonus ranking factor, layered on top of existing positions. They aren't. In May 2024, Google restructured the way most people receive information: AI Overviews launched in US Search, and for the queries that trigger them, users see a synthesised answer before any organic results. For many queries, that answer is sufficient. The blue links are supplementary.

In May 2025, Google went further. AI Mode launched as a mode where the entire search experience is AI-generated, drawing on a multi-stage retrieval architecture that's fundamentally different from how traditional Search works. The teams most at risk are those who assume their existing SEO performance transfers directly to AI search visibility. It doesn't.

What AI Overviews are (and are not).

AI Overviews aren't a new ranking layer on top of traditional Search. They're a different retrieval and generation system that happens to draw from the same underlying index as traditional Search.

When a query triggers an AI Overview, Google's systems perform a retrieval operation: relevant passages are fetched from the Search index, injected into Gemini's context window alongside the query, and Gemini generates a synthesised response. That response is the AI Overview. The sources cited below it are the passages that were retrieved and used.

Google confirmed this explicitly in its May 2026 AI search documentation: AI Overviews use Retrieval-Augmented Generation (RAG), which Google calls "grounding." The retrieval source is Google's Search index, not a separate AI index.

The baseline requirements for AI Overview eligibility are identical to what you already need for traditional Search. Eligibility is a floor, not a ceiling:

Passage-level retrieval: the key difference.

Traditional SEO optimises at the page level. The unit of relevance is a URL. You build a page, you optimise that page's signals, and the page either ranks or it doesn't.

AI Overview retrieval operates at the passage level. The unit of relevance is a paragraph, a section, or sometimes a single sentence. Individual passages from a page are evaluated independently: the passage's semantic relevance to the query, the clarity of its claim, the specificity of its information.

A page can be authoritative at the domain level, technically healthy, and well-linked, and still produce passages that are too vague, too hedged, or too poorly structured to be retrieved and cited. Conversely, a page with modest domain authority can produce a passage that precisely and clearly answers a query's intent, and that passage may be cited while stronger-authority pages are not.

The practical implication: your optimisation granularity needs to shift from the page to the paragraph.

AI Mode and query fan-out: multi-stage RAG.

AI Mode goes substantially further than AI Overviews. Where AI Overviews perform a single RAG retrieval pass, AI Mode uses a technique Google calls query fan-out: a multi-stage retrieval architecture that's the most significant structural change to Search since BERT in 2019.

Google's official description: query fan-out is "an information retrieval technique that expands a single user query into multiple sub-queries to capture different possible user intents, retrieving more diverse, broader results from different sources, including the live web, knowledge graph, and specialised data like Google Shopping."

Here's how it works in practice. You submit a query to AI Mode. Rather than performing a single retrieval operation against that query, the system uses an LLM to decompose it into multiple alternative sub-queries, each targeting a different facet of the original intent. Each sub-query retrieves independently. The results are collected and synthesised into a single response.

The practical consequence is that AI Mode's retrieval surface is far wider than traditional Search's top 10. A query that traditionally surfaces 10 results now generates multiple sub-queries, each with its own retrieval results. Content that doesn't rank in the top 10 for the primary query may rank for one of the sub-queries and get cited in the AI Mode response.

14%
From the traditional top 10
Of URLs cited by AI Mode, only 14% ranked in the traditional top 10 for the same query. Your existing SEO rankings are a poor predictor of AI Mode visibility.
86%
From outside the top 10
The majority of cited URLs come from outside the traditional top 10, retrieved via fan-out sub-queries that explore different facets of the original intent.

Only 14% of URLs cited by AI Mode rank in the traditional top 10 for the same query.

If that's the gap, the obvious question is what to do about it. I've written the step-by-step separately: how to rank in Google AI Mode. This article stays on the mechanism; that one is the playbook.

Google's patent: how query fan-out actually works.

Google's patent application US20240289407A1 provides technical documentation of the query fan-out mechanism. The patent describes a system that uses large language models to generate multiple alternate queries from the original search.

The process begins with "prompted expansion": an LLM receives structured instructions to create queries that explore different angles of the original intent. These expanded queries are sent to the retrieval system simultaneously or in parallel. The results are then collected and passed to a synthesis model that generates a unified response.

The sub-queries generated by fan-out aren't simply synonyms of the original. They explore different facets of the intent: related questions, narrower specifications, adjacent topics, different framings of the same underlying information need. A query about "how to improve website speed" might fan out to sub-queries covering:

Each sub-query retrieves independently from different sources.

This has a significant structural implication for content strategy: topical depth within a subject area creates multiple retrieval entry points. A site with comprehensive coverage of a topic (not just a single article, but interconnected content covering multiple facets) is more likely to be retrieved across multiple fan-out sub-queries than a site with a single strong piece on the primary query.

Deep Search: the agentic endpoint.

At the far end of the AI Mode architecture is Deep Search: a variant where Google's systems can issue dozens or even hundreds of background queries and may take several minutes to complete. Where AI Mode fan-out is measured in a handful to a dozen sub-queries, Deep Search is effectively unbounded multi-stage RAG.

Deep Search is triggered for queries that Google's systems determine require deep reasoning: complex research questions, multi-faceted comparative analysis, questions where the answer requires synthesising information from many disparate sources.

For content appearing in Deep Search results, the retrieval threshold differs from AI Overviews. A piece that directly addresses a specific sub-topic (even a niche one that isn't broadly linked) can be retrieved and cited if it's the best available source for that specific facet of the query.

How Perplexity and Copilot compare.

If you're looking beyond Google, the architectures of Perplexity and Microsoft Copilot offer useful contrast.

Perplexity performs real-time RAG on every single query via its Sonar model. There's no pre-existing ranked index that Perplexity retrieves from: retrieval happens live at query time against the public web. Traditional link-based SEO signals have no structural advantage in Perplexity. What matters is crawlability, content freshness, and factual authority. A page published and indexed today can be cited by Perplexity today. A page with years of accumulated links but outdated content may not be.

Microsoft Copilot uses a dual-layer retrieval architecture. For web-grounded responses, Copilot generates targeted search queries from your prompt (not the prompt itself, but a distilled set of terms the system determines will retrieve relevant information) and sends those queries to Bing. The retrieval is from Bing's index, which means Bing indexation is the baseline requirement for Copilot web citation. Content that's not in Bing's index can't be cited in Copilot web-grounded responses regardless of Google ranking.

What Google's own documentation says works, and what explicitly does not.

Google's May 2026 AI search documentation is the most authoritative first-party statement available on AI search optimisation.

What the documentation says works:

What Google's documentation explicitly says does not work:

  • llms.txt files: no evidence they affect AI search visibility
  • Content "chunking" specifically for AI: the model does its own chunking; manual chunking isn't a signal
  • AI-specific phrasing rewrites: optimising language to sound more "AI-friendly" isn't a documented signal
  • Inauthentic mention campaigns: coordinated brand mentions across AI-generated or low-quality content
The sharpest line in Google's documentation
If your content could be generated by an AI without consulting your site, it probably won't be cited by one.

This isn't a writing tip. It's the structural logic of a RAG system: if the model can generate equivalent content from training data alone, your page provides no grounding value. Non-commodity content (original data, expert analysis, first-hand experience) is structurally more citable.

The decoupling of AI visibility from organic rankings.

The central practical implication of everything in this article is that AI search visibility and traditional organic search ranking are substantially decoupled.

They share a foundation (Google's Search index, baseline quality signals, technical health requirements) but the retrieval mechanisms above that foundation operate differently. Passage-level retrieval rather than page-level ranking. Semantic similarity as the primary retrieval signal rather than link authority. Query fan-out creating multiple retrieval entry points rather than a single SERP to rank within.

This decoupling creates both a threat and an opportunity.

The opportunity
Lower authority sites can achieve AI visibility their SEO rankings wouldn't predict.
Content that is precisely and clearly responsive to specific query intents, well-structured for passage retrieval, and factually authoritative in ways the model's training data can't replicate: that content gets cited regardless of domain authority.
The threat
Strong traditional rankings don't guarantee AI visibility.
A page that ranks #1 for a query may not be cited in the AI Overview for that same query. The retrieval mechanism is different. Link authority is one signal among several, not the primary gate.

If you understand the architecture, not just the tactics, you can navigate the changes. And they will come. Once the mechanism makes sense, the practical next steps are in the AI Mode ranking playbook, and if you only want the difference between the two features, I've broken it down in AI Overviews vs AI Mode.

Next in the series

Article 6 goes directly to what each platform has published about how its own system works: Google, Microsoft, Anthropic, OpenAI, and Perplexity. First-party sources only. Read Article 6 →

SOURCES
  1. Google Search Central, "Optimizing for Generative AI Features on Google Search," 2026 - developers.google.com/search/docs/...
  2. Google, "How Google AI Visual Search Works" (Query Fan-Out), 2026 - blog.google/...
  3. Google Cloud, "RAG and Grounding on Vertex AI," 2024 - cloud.google.com/blog/...
  4. Google Cloud, "What is Retrieval-Augmented Generation?" - cloud.google.com/use-cases/retrieval-augmented-generation
  5. Google, "How Search Works" - google.com/search/howsearchworks
  6. Semrush, "What Is Query Fan-Out?" - semrush.com/blog/query-fan-out
  7. Wikipedia: Google Gemini - en.wikipedia.org/wiki/Google_Gemini
  8. Wikipedia: Retrieval-augmented generation - en.wikipedia.org/wiki/Retrieval-augmented_generation
  9. Microsoft Learn, "Use Public Websites to Improve Generative Answers" - learn.microsoft.com/...
  10. Perplexity, "Sonar Models Documentation" - docs.perplexity.ai/docs/sonar/models
  11. Wikipedia: Microsoft Copilot - en.wikipedia.org/wiki/Microsoft_Copilot
  12. Wikipedia: Perplexity AI - en.wikipedia.org/wiki/Perplexity_AI