How AI Search Works
1
Attention Is All You Need: The Google Paper That Accidentally Ended Google's Search Monopoly.
On June 12, 2017, eight researchers at Google Brain published a 15-page paper. It accidentally made Google's own search business obsolete.
2
How Large Language Models Actually Work: Tokens, Context Windows, and Why Your Content Gets Ignored.
The answer to "how do I get cited by AI search?" is technical, not strategic. It's rooted in how large language models actually process information.
3
What Is RAG: the Only Thing That Matters for AI Search.
Once you see how Retrieval-Augmented Generation works, the rest of AI search visibility clicks into place. It's the mechanism that connects your content to every answer these systems produce.
4
The AI Search Race: ChatGPT, Gemini, Claude, Copilot, and Perplexity. A Technical Timeline.
The models, the milestones, and the architectural decisions that shaped every AI search product in market. A reference for anyone trying to make sense of where this came from.
5
How Google AI Overviews and AI Mode Actually Work: Why Your SEO Rankings Don't Guarantee AI Visibility.
AI Overviews aren't a ranking bonus layered on top of existing positions. In May 2024, Google restructured how most people receive information. The blue links became supplementary.
6
7
Why the Bing AI Performance Report Is the Most Underused Tool in GEO.
Microsoft's Bing AI Performance report is the only first-party citation dashboard from any AI search platform. Most people working in GEO aren't using it.
8
The GEO Documentation Gap: Why Three Platforms Haven't Told You How to Rank in Their Systems.
Google published a guide. Microsoft published measurement tooling. Anthropic, OpenAI, and Perplexity have published nothing equivalent. Understanding why is more useful than waiting.
9
Constitutional AI vs RLHF: Why the Alignment Method Affects What Gets Cited.
ChatGPT is aligned with RLHF. Claude is aligned with Constitutional AI. These aren't interchangeable, and the difference affects what each model treats as trustworthy and citable.
10
Training Data Is the New Ranking Factor: The Common Crawl AI Visibility Audit Explained.
Before RAG, before indexing, a site has to be reachable by the crawlers that feed AI training data. The Common Crawl AI Visibility Audit is the framework for checking that upstream layer.
Writing  ·  Series: How AI Search Works  ·  Part 6 of 10

How to Be Surfaced in AI Search: What the Providers' Own Documentation Actually Says.

Five platforms, five retrieval architectures, one honest assessment of what each company has published about how to be cited, and what the gaps in their documentation tell you.

I've read all five platforms' first-party documentation so you don't have to. Most GEO content treats all AI search platforms as equivalent, produces a list of tactics (write clearly, use headings, add statistics, demonstrate expertise), and implies that applying those tactics will improve visibility across the board.

That's partially correct and almost entirely unsourced. This article takes a different approach: it goes directly to what each platform has published about how its own system works and how content gets selected for citation. Where platforms have published explicit guidance, I've summarised and sourced it. Where they've published nothing, I've named that gap directly and articulated what the architecture implies in its absence.

Google: the most explicit platform guidance available.

Google is the only platform among the five that has published an explicit, dedicated guide to optimising for AI search features. The Google Search Central documentation published in May 2026 is the most authoritative first-party statement on GEO currently available from any platform.

The confirmed architecture: AI Overviews and AI Mode both use Retrieval-Augmented Generation (grounding). The retrieval source is Google's Search index, not a separate AI index.

The baseline requirement: your content must be indexed, crawlable, and eligible to appear with a snippet. Without that, no AI-specific optimisation is relevant.

What Google says works:

What Google explicitly says doesn't work:

  • llms.txt files: not a meaningful signal for AI search visibility
  • Content "chunking" for AI: the model performs its own chunking; manual chunking isn't a signal
  • AI-specific phrasing rewrites: optimising language to sound "AI-friendly" isn't documented
  • Inauthentic mention campaigns: don't improve AI search visibility
Google's defining statement
If your content could be generated by an AI without consulting your site, it probably won't be cited by one.

This is the sharpest line in Google's documentation. It follows directly from the RAG architecture: if the model can generate equivalent content from training data, your page adds no grounding value. Non-commodity content (original research, expert analysis, proprietary data) is structurally more citable.

If your content could be generated by an AI without consulting your site, it probably won't be cited by one.

Microsoft: the best measurement tooling, thinner content guidance.

Microsoft hasn't published a GEO optimisation guide equivalent to Google's May 2026 document. But it has published significant technical documentation and one piece of tooling that leads every other platform.

What Microsoft has published on retrieval: for web-grounded responses, Copilot generates targeted search queries from the user's prompt and sends them to Bing. These queries aren't the user's original prompt. Microsoft's semantic indexing documentation reveals that Copilot retrieves based on semantic meaning, not keyword matching.

What Microsoft has published in tooling: the Bing AI Performance report, launched in February 2026, is the only first-party citation measurement tool available from any AI search platform. It shows:

The grounding queries data is particularly valuable. It reveals how Copilot interprets user questions and reformulates them for retrieval, showing you what the model actually searches for rather than what users typed.

From Microsoft's official documentation: "Strengthen depth and expertise: pages cited for specific grounding query phrases often reflect clear subject focus and domain expertise. Deepening coverage in related areas can reinforce authority." And: "Improve structure and clarity: clear headings, tables, and FAQ sections help surface key information and make content easier for AI systems to reference accurately."

The baseline for Copilot web citation: because Copilot grounds web responses via Bing's index, the technical baseline is Bing indexation. Content that's not in Bing's index can't be cited in Copilot web-grounded responses, regardless of Google ranking.

Anthropic: architecture published, no optimisation guide.

Anthropic has published no GEO optimisation guide. There's no first-party equivalent of Google's May 2026 document.

What Anthropic has published on retrieval: when Claude receives a request that would benefit from current information, it uses its reasoning capabilities to determine whether web search would help. If so, Claude generates a targeted search query, retrieves relevant results, analyses them for key information, and provides a response with citations. Claude can also conduct multiple progressive searches, using earlier results to inform subsequent queries.

Anthropic also launched Citations: an API feature that lets Claude ground its answers in source documents, automatically citing the specific sentences and passages it uses to generate responses.

What Constitutional AI implies for your content: Claude is trained against Constitutional AI principles that prioritise helpfulness, honesty, and harmlessness. Content that's factually precise, clearly sourced, and unambiguous about what it claims aligns with what Claude is trained to prefer as citation material. Content that's vague, hedged without cause, or makes unattributed assertions is structurally less useful as a grounding source.

The honest gap: Anthropic has published its retrieval architecture and alignment principles but hasn't translated these into content optimisation guidance for publishers.

OpenAI: retrieval documented, publisher guidance absent.

OpenAI has published no GEO optimisation guide.

What OpenAI has published: OpenAI's Help Centre documentation states: "ChatGPT search typically rewrites your query into one or more targeted queries that it sends those providers." That's the most direct first-party statement about ChatGPT's query reformulation. The mechanism is analogous to Google's query fan-out and Copilot's query distillation.

For Deep Research, OpenAI's documentation describes the mode as agentic: it actively plans and carries out a multi-step research process, searching, evaluating sources, refining queries, and synthesising findings. The mode is explicitly designed to find "niche, non-intuitive information that would otherwise require reviewing many sources."

What the API documentation reveals: GPT-5 and later models support dynamic filtering, where the model can write and execute code to filter search results before they reach the context window, keeping only relevant information. Precise, focused content that clearly addresses a specific query is more likely to survive this filtering step than broad, unfocused content.

The honest gap: the architecture implies the same universal requirements (crawlability, factual precision, clear structure, non-commodity value) derived from the RAG architecture and query reformulation mechanics, not from an explicit OpenAI statement.

Perplexity: the most transparent publisher relationship.

Perplexity hasn't published a formal GEO optimisation guide, but it's been more transparent than any other platform about the publisher side of its citation economics.

What Perplexity has published on its retrieval model: Perplexity's Sonar documentation describes optimisation across two dimensions: answer factuality ("how well a model can answer questions using facts that are grounded in search results") and readability ("a model's ability to provide a concise and detailed answer with the appropriate use of markdown formatting").

Factuality as a retrieval criterion means content making precise, specific, verifiable claims retrieves better than content making vague or unattributed assertions. Readability means content structured for clear passage-level extraction performs better in Perplexity's synthesis.

Sonar uses low-latency hybrid search combining semantic similarity methods, LLM ranking, and human feedback signals. The inclusion of human feedback signals means the model has learned from user satisfaction signals about which retrieved content produces good answers.

The publisher programme: in July 2024, Perplexity launched a publisher revenue-sharing programme, the first formal attempt by any AI search platform to create an economic alignment between citation and publisher compensation.

Architecture implications for Perplexity: traditional link-based authority has no structural advantage. What matters is crawlability, freshness, and factual authority.

The honest synthesis: what the documentation gap tells you.

Reading across all five platforms, a pattern emerges.

Google
Explicit guidance published
May 2026 AI search optimisation guide. The most authoritative first-party statement on GEO from any platform. Confirms RAG/grounding, outlines signals, names what doesn't work.
Microsoft
Measurement tooling + technical docs
No dedicated GEO guide, but significant technical documentation and the Bing AI Performance report, the only first-party citation dashboard from any platform.
Anthropic
Architecture only, no publisher guide
Has published retrieval architecture and Constitutional AI principles. No publisher-facing optimisation guidance. Constitutional AI alignment implies preference for precise, honest, well-attributed content.
OpenAI
Developer docs only, no publisher guide
API documentation on how web search works technically. No publisher-facing optimisation guidance.
Perplexity
Sonar docs with implicit signals
Sonar documentation explicitly frames factuality and readability as optimisation targets. Publisher revenue-sharing programme. No dedicated GEO guide, likely resource constraints rather than strategy.

Anthropic, OpenAI, and Perplexity have published nothing on content optimisation, and this is meaningful. The absence of guidance isn't a signal that optimisation is irrelevant. It's a signal that you, if you understand the architecture, have a structural advantage over those waiting for guidance that may not arrive.

The universal requirements that emerge from all five architectures:

Now go and apply it. The architecture is the guide.

One platform has published explicit guidance. One has published measurement tooling. Three have published nothing equivalent. The architecture is the guide. Build for the retrieval mechanism (crawlable, factually precise, clearly structured, non-commodity) and it works across all five without waiting for documentation that may never arrive.

SOURCES
  1. Google Search Central, "Optimizing for Generative AI Features on Google Search," 2026 - developers.google.com/search/docs/fundamentals/ai-optimization-guide
  2. Google Cloud, "RAG and Grounding on Vertex AI," 2024 - cloud.google.com/blog/products/ai-machine-learning/rag-and-grounding-on-vertex-ai
  3. Google, "How Search Works" - google.com/search/howsearchworks
  4. Microsoft Bing, "Introducing AI Performance in Bing Webmaster Tools," February 2026 - blogs.bing.com/webmaster/February-2026/Introducing-AI-Performance-in-Bing-Webmaster-Tools-Public-Preview
  5. Microsoft Learn, "Microsoft 365 Copilot Architecture" - learn.microsoft.com/en-us/microsoft-365/copilot/microsoft-365-copilot-architecture
  6. Microsoft Learn, "Semantic Indexing for Microsoft 365 Copilot" - learn.microsoft.com/en-us/microsoftsearch/semantic-index-for-copilot
  7. Microsoft Learn, "Use Public Websites to Improve Generative Answers" - learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/generative-ai-public-websites
  8. Microsoft, "15 Milestones," 2025 - news.microsoft.com/source/features/ai/15-milestones-that-shaped-microsofts-vision-for-ai
  9. Anthropic, "Introducing Web Search on the Anthropic API," 2025 - anthropic.com/news/web-search-api
  10. Anthropic, "Introducing Citations on the Anthropic API," 2025 - anthropic.com/news/introducing-citations-api
  11. Anthropic, "Claude's Constitution," 2022 - anthropic.com/news/claudes-constitution
  12. Anthropic, Research - anthropic.com/research
  13. OpenAI, "Introducing ChatGPT Search," 2024 - openai.com/index/introducing-chatgpt-search
  14. OpenAI Help Centre, "ChatGPT Search" - help.openai.com/en/articles/9237897-chatgpt-search
  15. OpenAI API Documentation, "Web Search" - platform.openai.com/docs/guides/tools-web-search
  16. OpenAI Academy, "Research with ChatGPT" - openai.com/academy/search-and-deep-research
  17. Perplexity, "Sonar Models Documentation" - docs.perplexity.ai/docs/sonar/models
  18. Perplexity, "Meet New Sonar" - perplexity.ai/hub/blog/meet-new-sonar
  19. Perplexity, "Introducing the Perplexity Publishers' Program," 2024 - perplexity.ai/hub/blog/introducing-the-perplexity-publishers-program
  20. Perplexity, "Why We're Experimenting with Advertising," 2024 - perplexity.ai/hub/blog/why-we-re-experimenting-with-advertising
  21. Wikipedia: Retrieval-augmented generation - en.wikipedia.org/wiki/Retrieval-augmented_generation

Let's talk about your visibility.

Invisible in AI answers, slipping in search, or both. Thirty minutes is enough to work out where to start.

Book a call