How AI Search Works
1
Attention Is All You Need: The Google Paper That Accidentally Ended Google's Search Monopoly.
On June 12, 2017, eight researchers at Google Brain uploaded a 15-page paper to arXiv. It accidentally made Google's own search business obsolete.
2
How Large Language Models Actually Work: Tokens, Context Windows, and Why Your Content Gets Ignored.
The answer to "how do I get cited by AI search?" is technical, not strategic — rooted in how large language models actually process information.
3
What Is RAG: the Only Thing That Matters for AI Search.
Once you see how Retrieval-Augmented Generation works, the rest of AI search visibility clicks into place. It's the mechanism that connects your content to every answer these systems produce.
4
The AI Search Race: ChatGPT, Gemini, Claude, Copilot, and Perplexity. A Technical Timeline.
The models, the milestones, and the architectural decisions that shaped every AI search product in market — a reference for anyone trying to make sense of where this came from.
5
How Google AI Overviews and AI Mode Actually Work: Why Your SEO Rankings Don't Guarantee AI Visibility.
AI Overviews aren't a ranking bonus layered on top of existing positions. In May 2024, Google restructured how most people receive information — the blue links became supplementary.
6
How to Be Surfaced in AI Search: What the Platforms' Own Documentation Actually Says.
Most GEO content produces a list of tactics and implies they work across all platforms. This article goes directly to what each platform has actually published about how its own system works.
7
Why the Bing AI Performance Report Is the Most Underused Tool in GEO.
Microsoft's Bing AI Performance report is the only first-party citation dashboard from any AI search platform. Most practitioners aren't using it.
8
The GEO Documentation Gap: Why Three Platforms Haven't Told You How to Rank in Their Systems.
Google published a guide. Microsoft published measurement tooling. Anthropic, OpenAI, and Perplexity have published nothing equivalent. Understanding why is more useful than waiting.
9
Writing  ·  Series: How AI Search Works  ·  Follow-on 3

Constitutional AI vs RLHF: Why the Alignment Method Affects What Gets Cited.

ChatGPT is aligned with RLHF. Claude is aligned with Constitutional AI. These are not interchangeable approaches — and the difference affects what each model treats as trustworthy and citable at the margin.

When a practitioner asks how to get their content cited by Claude versus how to get it cited by ChatGPT, they are typically assuming the answer is the same. Write clearly, demonstrate expertise, provide value. Fine.

But the models are aligned differently. ChatGPT is trained with Reinforcement Learning from Human Feedback. Claude is trained with Constitutional AI. These are not interchangeable approaches to the same problem. They encode different values into the model's behaviour — and those different values affect, at the margin, what each model treats as trustworthy, citable, and well-sourced. This is the most technically ambitious article in this series, and the one most likely to be cited by practitioners who want to demonstrate genuine depth of understanding.

The alignment problem.

Training a large language model produces a model that is very good at predicting the next token. It does not produce a model that is reliably helpful, honest, or safe. The statistical patterns learned from internet-scale text include harmful content, misinformation, biased reasoning, and ways of being maximally engaging that are not the same as being maximally accurate.

The alignment problem is the challenge of making a powerful, capable model behave in ways that are beneficial rather than harmful — reliable, honest, appropriately uncertain, and respectful of the user's interests. No training technique has solved this completely. But different approaches encode different values with different emphases, and those differences are meaningful for anyone trying to understand what each model favours when it cites sources.

RLHF: learning from human preferences.

Reinforcement Learning from Human Feedback is the alignment technique used to produce ChatGPT from GPT-3.5, and the primary alignment technique underlying the GPT model family. The process has three stages.

Stage 1: Supervised Fine-Tuning. Human trainers write examples of ideal responses to prompts. The model is fine-tuned on these examples, learning what good responses look like from human-authored demonstrations.

Stage 2: Reward Model Training. The model generates multiple responses to the same prompt. Human raters rank these responses by quality. A separate reward model is trained to predict which responses humans will prefer — essentially learning a human preference function from the ranking data.

Stage 3: RL Optimisation. The language model is further trained using reinforcement learning, with the reward model providing the reward signal. The model learns to generate responses that the reward model predicts humans will prefer.

The result is a model that has learned to produce responses satisfying a human preference function. That preference function reflects what a particular group of human raters — typically English-speaking, from a specific demographic and cultural context — found preferable in training data gathered at a specific point in time.

RLHF is empirical: it encodes what humans prefer, as measured. It does not encode what humans should prefer, as reasoned. The distinction matters for edge cases — situations where raters' preferences are inconsistent, culturally specific, or reflect biases in the rater pool.

For content citation specifically: a model trained with RLHF prefers responses that are satisfying to the human reader — responses that feel helpful, authoritative, and clear. Content that is well-written, confident, and accessible tends to be cited by RLHF-aligned models because those are the properties that produce responses the reward model scores highly. Content that is heavily hedged, technically complex but inaccessible, or uncertain in ways that reduce response quality is less likely to be cited.

Constitutional AI: learning from principles.

Constitutional AI is the alignment technique developed by Anthropic for Claude. It was introduced in Anthropic's 2022 research paper and has been refined across subsequent model generations. The approach differs fundamentally from RLHF in its starting point: rather than learning from human preference ratings, Constitutional AI trains the model against an explicit set of principles — a constitution — that articulates how the model should behave, what values it should uphold, and what tradeoffs it should make when values conflict.

The process has two key stages.

Stage 1: Supervised Learning from Self-Critique. The model is given a prompt and generates an initial response. It is then given the constitution and asked to critique its own response against the constitutional principles — does the response violate any principles? Is it harmful, deceptive, or unhelpful in ways the constitution identifies? The model then revises its response based on its own critique. This revised response is used as training data.

Stage 2: Reinforcement Learning from AI Feedback (RLAIF). Rather than using human raters to rank model outputs, a separate constitutional AI model evaluates responses against the principles. This AI evaluator generates the preference labels used to train the final model. The result is alignment based on explicit, articulable principles rather than implicit human preferences.

The constitution itself articulates values: the model should be helpful, honest, and harmless. It should avoid deception. It should acknowledge uncertainty. It should be direct rather than sycophantic. It should treat users as intelligent adults capable of handling honest answers.

For content citation specifically: a model trained with Constitutional AI places explicit weight on honesty and accuracy. Claude is specifically trained to acknowledge uncertainty, to avoid claiming knowledge it does not have, and to be honest about the limitations of its information. Content that is factually precise, appropriately attributed, and honest about what it does and does not claim is more aligned with Claude's training values than content that is overconfident, poorly sourced, or that overstates certainty.

The practical difference for content.

The difference between RLHF and Constitutional AI is most visible in how each model handles uncertainty and attribution.

An RLHF-aligned model is trained to produce responses that human raters prefer. Human raters often prefer confident, clear responses to hedged, uncertain ones — even when uncertainty is the epistemically correct stance. This can produce a slight systematic preference for confident-sounding sources, even when the confidence is not fully warranted.

A Constitutional AI-aligned model is trained against explicit principles that include appropriate expression of uncertainty. Claude is trained to say "I don't know" when it doesn't know, to express uncertainty when it is genuinely uncertain, and to prefer sources that are honest about the limits of their claims. This produces a slight systematic preference for content that is epistemically honest — content that says "this data is from Q1 2025 and may have been updated" rather than presenting potentially outdated information with unwarranted confidence.

For practitioners, the implication is small but real. Content written for Claude citation should be especially precise about the provenance and currency of its claims. Attributing statistics to specific sources with specific dates, acknowledging the limits of the data, and being honest about what is known versus inferred — these are not just good content principles. They are properties that Constitutional AI-aligned models are specifically trained to value.

Neither approach is perfect — and both converge.

It would be misleading to conclude that one alignment approach is straightforwardly superior. RLHF has produced models that are exceptionally well-calibrated to human preferences across a vast range of tasks. The human preference data — gathered from diverse raters across many contexts — captures a broad range of what makes responses good in practice. Models trained with RLHF are often better at matching the register and tone of responses to context, because human raters naturally prefer tonally appropriate responses.

Constitutional AI produces models that are more transparent about their values — you can read Claude's constitution and understand what principles the model was trained against. The explicit principle set means Constitutional AI alignment is more interpretable than RLHF, where the preference function is learned rather than stated. But constitutions are written by humans and reflect the values of their authors. The claim that Constitutional AI is more "objective" than RLHF overstates the case.

In practice, both approaches are continually refined. GPT-5 incorporates alignment techniques that have evolved significantly beyond the original InstructGPT RLHF implementation. Claude's constitution is updated periodically. The models converge in many respects even as their fundamental alignment approaches differ.

The honest answer for practitioners is that the alignment method is a secondary signal compared to the universal RAG requirements — crawlability, factual precision, clear structure, non-commodity value. A site that is not indexed cannot be cited by either model. A site with vague, poorly structured content will not be cited well by either. But at the margin, where two roughly equivalent pieces of content compete to be cited:

The deepest practical implication is this: the best content for both models is the same. Content that is factually precise, clearly structured, appropriately attributed, honest about uncertainty, and genuinely valuable to the reader satisfies the requirements of both alignment approaches. They converge on the same content properties because they are both attempts to solve the same underlying problem — making AI systems reliably helpful and honest.

Write content that is true. Write content that is clear. Write content that acknowledges what it does not know. Attribute your claims. Provide value that the model cannot generate without you. That is the content that gets cited. The alignment method tells you why.

/ End of series

This is the final article in the How AI Search Works series. View the full series index →

SOURCES
  1. Anthropic, "Claude's Constitution," 2022 — anthropic.com/news/claudes-constitution
  2. Anthropic, Research — anthropic.com/research
  3. OpenAI, "Training Language Models to Follow Instructions with Human Feedback (InstructGPT)," 2022 — openai.com/index/instruction-following
  4. Wikipedia: Reinforcement learning from human feedback — en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback
  5. Wikipedia: Claude (language model) — en.wikipedia.org/wiki/Claude_(language_model)
  6. Wikipedia: Anthropic — en.wikipedia.org/wiki/Anthropic
  7. Wikipedia: ChatGPT — en.wikipedia.org/wiki/ChatGPT
  8. Wikipedia: Large language model — en.wikipedia.org/wiki/Large_language_model
  9. Wikipedia: Generative pre-trained transformer — en.wikipedia.org/wiki/Generative_pre-trained_transformer
  10. Vaswani et al., "Attention Is All You Need," Google Brain / Google Research, 2017 — research.google/pubs/attention-is-all-you-need
How AI Search Works

The full series.

1
Available now
Attention Is All You Need: The Google Paper That Accidentally Ended Google's Search Monopoly
2
Available now
How Large Language Models Actually Work: Tokens, Context Windows, and Why Your Content Gets Ignored
3
Available now
What Is RAG: the Only Thing That Matters for AI Search
4
Available now
The AI Search Race: ChatGPT, Gemini, Claude, Copilot, and Perplexity. A Technical Timeline
5
Available now
How Google AI Overviews and AI Mode Actually Work: Why Your SEO Rankings Don't Guarantee AI Visibility
6
Available now
How to Be Surfaced in AI Search: What the Providers' Own Documentation Actually Says
7
Available now
Why the Bing AI Performance Report Is the Most Underused Tool in GEO
8
Available now
The GEO Documentation Gap: Why Anthropic, OpenAI, and Perplexity Haven't Told You How to Rank in Their Systems
9
You are here
Constitutional AI vs RLHF: Why the Alignment Method Affects What Gets Cited
tc
/ Written by

Thomas Cox

Twelve years in B2B SEO, most recently at VP level. Now independent — helping companies stay discoverable as buyer search moves into ChatGPT, Perplexity, and Google AI Overviews. Remote · UK.

Want this in your inbox?

Roughly one essay a month. No drip sequences, no upsells.

Subscribe