Writing · Series: How AI Search Works · Part 6 of 6
How to Be Surfaced in AI Search: What the Providers' Own Documentation Actually Says.
Five platforms, five retrieval architectures, one honest assessment of what each company has published about how to be cited, and what the gaps in their documentation tell you.
Published1 June 2026
ByThomas Cox
Read time15 minutes
Filed underSeries · First-Party GEO · Part 6 of 6
I've read all five platforms' first-party documentation so you don't have to. Most GEO content treats all AI search platforms as equivalent, produces a list of tactics (write clearly, use headings, add statistics, demonstrate expertise), and implies that applying those tactics will improve visibility across the board.
That's partially correct and almost entirely unsourced. This article takes a different approach: it goes directly to what each platform has published about how its own system works and how content gets selected for citation. Where platforms have published explicit guidance, I've summarised and sourced it. Where they've published nothing, I've named that gap directly and articulated what the architecture implies in its absence.
Google: the most explicit platform guidance available.
Google is the only platform among the five that has published an explicit, dedicated guide to optimising for AI search features. The Google Search Central documentation published in May 2026 is the most authoritative first-party statement on GEO currently available from any platform.
The confirmed architecture: AI Overviews and AI Mode both use Retrieval-Augmented Generation (grounding). The retrieval source is Google's Search index, not a separate AI index.
The baseline requirement: your content must be indexed, crawlable, and eligible to appear with a snippet. Without that, no AI-specific optimisation is relevant.
What Google says works: foundational SEO first. Technical health, crawlability, snippet eligibility, and E-E-A-T are the baseline signals. Google's position is that AI search is built on top of traditional search infrastructure, not as a replacement for it.
Non-commodity content is the next requirement. Google's documentation is specific: content should provide "unique, expert-led" value that goes "beyond common knowledge." If an AI model could generate equivalent content from its training data without consulting your page, your page provides no grounding value.
Clear extractable structure matters too. Content must be structured so that AI systems can lift and cite specific claims: clear headings, front-loaded key claims, unambiguous paragraph structure, attributable data points.
What Google explicitly says doesn't work:
- llms.txt files (not a meaningful signal)
- Content "chunking" for AI (the model performs its own chunking)
- AI-specific phrasing rewrites (not a documented signal)
- Inauthentic mention campaigns (don't improve AI search visibility)
The defining statement: if your content could be generated by an AI without consulting your site, it probably won't be cited by one.
Microsoft: the best measurement tooling, thinner content guidance.
Microsoft hasn't published a GEO optimisation guide equivalent to Google's May 2026 document. But it has published significant technical documentation and one piece of tooling that leads every other platform.
What Microsoft has published on retrieval: for web-grounded responses, Copilot generates targeted search queries from the user's prompt and sends them to Bing. These queries aren't the user's original prompt. Microsoft's semantic indexing documentation reveals that Copilot retrieves based on semantic meaning, not keyword matching.
What Microsoft has published in tooling: the Bing AI Performance report, launched in February 2026, is the only first-party citation measurement tool available from any AI search platform. It shows which URLs from a verified site are cited in Microsoft Copilot and Bing AI-generated answers, how often each URL is cited, grounding queries (the internal search queries Copilot generates from user prompts), and visibility trends over time.
The grounding queries data is particularly valuable. It reveals how Copilot interprets user questions and reformulates them for retrieval, showing you what the model actually searches for rather than what users typed.
From Microsoft's official documentation: "Strengthen depth and expertise — pages cited for specific grounding query phrases often reflect clear subject focus and domain expertise. Deepening coverage in related areas can reinforce authority." And: "Improve structure and clarity — clear headings, tables, and FAQ sections help surface key information and make content easier for AI systems to reference accurately."
The baseline for Copilot web citation: because Copilot grounds web responses via Bing's index, the technical baseline is Bing indexation. Content that's not in Bing's index can't be cited in Copilot web-grounded responses, regardless of Google ranking.
Anthropic: architecture published, no optimisation guide.
Anthropic has published no GEO optimisation guide. There's no first-party equivalent of Google's May 2026 document.
What Anthropic has published on retrieval: when Claude receives a request that would benefit from current information, it uses its reasoning capabilities to determine whether web search would help. If so, Claude generates a targeted search query, retrieves relevant results, analyses them for key information, and provides a response with citations. Claude can also conduct multiple progressive searches, using earlier results to inform subsequent queries.
Anthropic also launched Citations: an API feature that lets Claude ground its answers in source documents, automatically citing the specific sentences and passages it uses to generate responses.
What Constitutional AI implies for your content: Claude is trained against Constitutional AI principles that prioritise helpfulness, honesty, and harmlessness. Content that's factually precise, clearly sourced, and unambiguous about what it claims aligns with what Claude is trained to prefer as citation material. Content that's vague, hedged without cause, or makes unattributed assertions is structurally less useful as a grounding source.
The honest gap: Anthropic has published its retrieval architecture and alignment principles but hasn't translated these into content optimisation guidance for publishers.
OpenAI: retrieval documented, publisher guidance absent.
OpenAI has published no GEO optimisation guide.
What OpenAI has published: OpenAI's Help Centre documentation states: "ChatGPT search typically rewrites your query into one or more targeted queries that it sends those providers." That's the most direct first-party statement about ChatGPT's query reformulation. The mechanism is analogous to Google's query fan-out and Copilot's query distillation.
For Deep Research, OpenAI's documentation describes the mode as agentic: it actively plans and carries out a multi-step research process, searching, evaluating sources, refining queries, and synthesising findings. The mode is explicitly designed to find "niche, non-intuitive information that would otherwise require reviewing many sources."
What the API documentation reveals: GPT-5 and later models support dynamic filtering, where the model can write and execute code to filter search results before they reach the context window, keeping only relevant information. Precise, focused content that clearly addresses a specific query is more likely to survive this filtering step than broad, unfocused content.
The honest gap: the architecture implies the same universal requirements (crawlability, factual precision, clear structure, non-commodity value) derived from the RAG architecture and query reformulation mechanics, not from an explicit OpenAI statement.
Perplexity: the most transparent publisher relationship.
Perplexity hasn't published a formal GEO optimisation guide, but it's been more transparent than any other platform about the publisher side of its citation economics.
What Perplexity has published on its retrieval model: Perplexity's Sonar documentation describes optimisation across two dimensions: answer factuality ("how well a model can answer questions using facts that are grounded in search results") and readability ("a model's ability to provide a concise and detailed answer with the appropriate use of markdown formatting").
Factuality as a retrieval criterion means content making precise, specific, verifiable claims retrieves better than content making vague or unattributed assertions. Readability means content structured for clear passage-level extraction performs better in Perplexity's synthesis.
Sonar uses low-latency hybrid search combining semantic similarity methods, LLM ranking, and human feedback signals. The inclusion of human feedback signals means the model has learned from user satisfaction signals about which retrieved content produces good answers.
The publisher programme: in July 2024, Perplexity launched a publisher revenue-sharing programme, the first formal attempt by any AI search platform to create an economic alignment between citation and publisher compensation.
Architecture implications for Perplexity: traditional link-based authority has no structural advantage. What matters is crawlability, freshness, and factual authority.
The honest synthesis: what the documentation gap tells you.
Reading across all five platforms, a pattern emerges.
Google has the most explicit guidance, and that guidance is consistent with what the architecture predicts. The recommendations (foundational SEO, non-commodity content, extractable structure) are the logical requirements of a RAG system applied to a quality-ranked index.
Microsoft has the best measurement tooling. The Bing AI Performance report is the only first-party citation dashboard available from any platform. Even if Bing/Copilot isn't your primary target, the grounding query data it provides is the only empirical window currently available into how AI systems are actually using your content.
Anthropic, OpenAI, and Perplexity have published nothing on content optimisation, and this is meaningful. The absence of guidance isn't a signal that optimisation is irrelevant. It's a signal that you, if you understand the architecture, have a structural advantage over those waiting for guidance that may not arrive.
The universal requirements that emerge from all five architectures:
- Content must be crawlable and indexable by the platform's retrieval system
- Content must be factually precise and clearly attributed
- Content must be structured for extraction: key claims front-loaded, headings clear, paragraphs unambiguous
- Content must provide information an AI system couldn't generate without consulting that specific source
None of these are new principles. All of them have a technical explanation that most GEO content doesn't provide. The series you've just read is that explanation. Now go and apply it.