Attention Is All You Need: The Google Paper That Accidentally Ended Google's Search Monopoly

On June 12, 2017, eight researchers at Google Brain and Google Research uploaded a 15-page paper to arXiv. Its title was "Attention Is All You Need." Its abstract promised an improvement to machine translation. Its authors expected it to spark a discussion among NLP researchers. What it actually did was make Google's own search business obsolete. It just took five years for anyone to notice.

Every AI product that has dominated headlines since 2022 traces its architecture directly to that paper. ChatGPT. Gemini. Claude. Microsoft Copilot. Perplexity. Google's own AI Overviews. All of it built on the Transformer architecture introduced in a paper written by the company it would come to threaten. I've spent a lot of time tracing how that happened, and the story is worth telling in full: the technical decisions, the competitive miscalculations, and the product moments that reshaped the information economy.

Jun 2017

Attention Is All You Need

Eight Google Brain researchers publish the Transformer architecture paper.

Every AI product built since traces directly to this paper.

Oct 2018

BERT

Google publishes a bidirectional Transformer — the first model to read context in both directions simultaneously.

First model to truly understand the meaning of a word in context, not just its position.

Oct 2019

BERT deployed in Google Search

Google rolls BERT into live search results. Language understanding at search scale, quietly.

AI in search starts here — not 2023. The 2022 wave was the visible surface of a shift already underway.

Jan 2020

Scaling Laws

OpenAI publishes proof that model performance follows predictable power-law relationships with size, data, and compute.

Justified hundreds of billions in compute investment. Seeded the founding of Anthropic.

May 2021

LaMDA unveiled

Google announces a 137B parameter dialog model at I/O — capable of the kind of open-ended conversation ChatGPT would make famous 18 months later.

Google had the technology. They chose not to ship it. That decision changed everything.

2021

Anthropic founded

Dario Amodei, Daniela Amodei, and former OpenAI researchers leave over disagreements about the pace of deployment and adequacy of RLHF as a safety approach.

The safety-vs-speed fault line that now defines the industry opens here.

Jun 2022

LaMDA "Sentient"

Google engineer Blake Lemoine claims LaMDA is sentient and has a soul. Google dismisses the claim and fires Lemoine. The story runs globally.

Exposed that Google had extraordinary capability sitting in a lab — six months before ChatGPT launched.

30 Nov 2022

ChatGPT launches

OpenAI ships GPT-3.5 with RLHF fine-tuning behind a simple chat interface. One million users in five days. One hundred million in two months.

The fastest consumer application adoption in recorded history. The race officially begins.

Jan 2023

Microsoft $10B OpenAI

Microsoft announces a $10 billion multi-year investment in OpenAI with clear intent to integrate GPT-4 into Bing.

Signalled to the entire industry that search was existential — and that Google's moat was at risk.

7 Feb 2023

New Bing launches

Microsoft ships GPT-4 inside Bing Search and Edge. For the first time, you could have a conversation with a search engine.

The first direct commercial threat to Google's 25-year search dominance.

8 Feb 2023

Bard error — $100B gone

Google rushes a Bard demo. Bard states the James Webb telescope took the first images of an exoplanet — factually wrong. Alphabet loses ~$100B in market cap by close.

The company that invented the Transformer, beaten by its own technology, then lost $100B in one day by panicking publicly.

Mar 2023

Claude launches

Anthropic ships Claude publicly, trained with Constitutional AI — a normative alignment approach rather than RLHF's empirical one.

Proved a third viable alignment path existed beyond Google and OpenAI's approaches.

Dec 2023

Gemini 1.0

Google ships a natively multimodal model — trained simultaneously on text, image, audio, video, and code. State of the art on 30 of 32 benchmarks at launch.

Google's ground-up rebuild worked. Technical parity regained — but product dominance is still unsettled.

Today

Five platforms compete

Google, OpenAI, Microsoft, Anthropic, and Perplexity — all built on the architecture from a 2017 Google paper.

The race that started in a lab in 2017 is still open. The platforms you're optimising for today didn't exist six years ago.

IMAGE PROMPT: AI-generated diagram showing side-by-side comparison of RNN (sequential, left-to-right arrow chain) vs Transformer (parallel attention grid) architectures. Clean, editorial style in ink-on-paper tones. Label each: "RNN — sequential, no parallelism" and "Transformer — parallel self-attention."

The world before Transformers.

Before 2017, the dominant approach to natural language processing was the Recurrent Neural Network, or RNN. RNNs processed text sequentially, word by word, left to right, feeding each step's output into the next step's input. The architecture made intuitive sense: language is sequential, so process it sequentially.

The problem was fundamental. Sequential processing meant sequential training. You couldn't parallelise an RNN: each step depended on the previous one, making it impossible to train across thousands of GPU cores simultaneously. This wasn't just slow. It meant RNNs had a hard ceiling on how large they could practically scale.

There was a second problem: memory. As an RNN processed longer sequences, earlier context became progressively diluted. Ask an RNN to answer a question about a document's first paragraph after reading the entire document, and the first paragraph was effectively gone. The model struggled to retain long-range dependencies (the relationships between concepts separated by many tokens).

These two problems (no parallelisation, no long-range memory) meant that as of 2016, even the best NLP models were genuinely poor at understanding language. They could do narrow tasks. They couldn't do general-purpose language reasoning.

The Transformer: what changed and why.

The 2017 paper proposed eliminating recurrence entirely. Instead of processing tokens sequentially, the Transformer processes the entire sequence simultaneously using a mechanism called self-attention.

Here's the core idea. For every token in a sequence, the model asks three questions: what am I looking for (Query), what do I have to offer other tokens (Key), and what information should I pass along (Value)? The dot products of these vectors determine how much each token attends to every other token in the sequence, and these attention weights are computed for all tokens in parallel, in a single operation.

The result is a model that can, in a single forward pass, establish relationships between any two tokens regardless of their distance in the sequence. The word "it" in a sentence can attend directly to "the animal" fourteen words earlier. The model doesn't have to carry that relationship forward through fourteen sequential steps. It computes it directly.

Multi-head attention extends this further: multiple attention operations run simultaneously, each learning different kinds of linguistic relationships. One head might specialise in syntactic dependencies. Another in coreference. Another in semantic similarity. The outputs are concatenated and projected, giving the model a rich, multi-dimensional representation of each token's context. Because all of these operations are parallelisable, Transformers could be trained on hardware at a scale that RNNs simply couldn't reach. And because scale, as the Scaling Laws paper would confirm three years later, predicts performance with remarkable consistency, this architectural shift was the beginning of everything.

IMAGE PROMPT: AI-generated annotated diagram of the multi-head self-attention mechanism: three columns labelled Q (Query), K (Key), V (Value) with connecting lines and attention weight arrows. Show multiple "heads" running in parallel. Clean technical illustration style with minimal colour — black, white, and a single accent colour.

BERT (2018): the Transformer enters Google Search.

Google published BERT (Bidirectional Encoder Representations from Transformers) in October 2018. Where the original Transformer had both an encoder and a decoder, BERT used only the encoder, making it optimised for understanding rather than generation.

The "bidirectional" in the name is the key innovation. Previous language models read text in one direction, left to right or right to left. BERT reads in both directions simultaneously, conditioning on the full left and right context of every token at the same time. This made BERT far better at understanding the meaning of a word in context: "bank" near "river" versus "bank" near "deposit."

By October 2019, Google deployed BERT directly into live Search to process queries. This was the first time a Transformer architecture powered real search results at scale. Quietly, without an announcement that made headlines, Search moved from keyword matching to genuine language comprehension.

Most people who talk about "AI search" as a 2023 phenomenon are missing this. The transformation of Search started in 2019, with a model Google published in 2018, based on a paper from 2017. The 2022-2023 wave was the consumer-facing surface of a technical shift that had been underway for years.

Scaling Laws (2020): the scientific case for the race.

In January 2020, a team at OpenAI published "Scaling Laws for Neural Language Models", a paper that would shape billions of dollars of compute investment and seed the founding of at least one major AI company.

The finding: language model performance follows predictable power-law relationships with three variables: model size (number of parameters), dataset size (number of training tokens), and the amount of compute used for training. These relationships held with remarkable consistency across more than seven orders of magnitude. Bigger models, trained on more data, with more compute, were reliably smarter, and the rate of improvement was predictable.

This wasn't obvious at the time. The assumption had been that scaling would hit diminishing returns, that some ceiling existed beyond which throwing more compute at a model wouldn't help. The Scaling Laws paper showed that ceiling, if it existed, was far beyond what anyone had yet reached.

Critically, Dario Amodei, now Anthropic's CEO, is a co-author on this paper. The insights it contained directly shaped GPT-3 later that year, and directly informed the disagreements at OpenAI that would lead Amodei and others to leave and found Anthropic in 2021. For Google, the Scaling Laws paper validated investing in ever-larger models. It set the conditions for LaMDA.

LaMDA (2021-22): the model Google wouldn't ship.

Google unveiled LaMDA (Language Models for Dialog Applications) at Google I/O in May 2021. The research paper was published in January 2022. LaMDA was a decoder-only Transformer with up to 137 billion parameters, pre-trained on 1.56 trillion words of public dialog data and web text.

It was, by any reasonable measure, capable of the kind of open-ended conversation that ChatGPT would make famous eighteen months later. Google knew this. Its own researchers had built the thing. And Google chose not to ship it.

The LaMDA paper is explicit about why. It identifies two unsolved challenges: safety and factual grounding. The model could say things that were harmful. The model could confidently assert things that were false. Google (with more to lose from a public AI failure than OpenAI) decided the risks outweighed the competitive urgency.

In June 2022, the decision became public in an unexpected way. Google engineer Blake Lemoine, who had been working with LaMDA, claimed the model was sentient: that it had a soul, that it was a person deserving of rights. Google dismissed the claim and fired Lemoine. The story ran globally, drawing worldwide attention to the fact that Google had a conversational AI of extraordinary capability sitting in a lab, not available to anyone. The irony would become clear six months later.

November 30, 2022: ChatGPT.

OpenAI launched ChatGPT on November 30, 2022. It was built on GPT-3.5, refined with Reinforcement Learning from Human Feedback (RLHF), a technique where human raters score model outputs and the model learns to prefer higher-rated responses. The product wrapper was a simple chat interface. Nothing technically groundbreaking. An alignment technique (RLHF) applied to an existing model, presented through a box where you type and something answers.

One million users in five days. One hundred million users in two months. The fastest consumer application adoption in recorded history at that point.

The alignment technique, not the model size, is what made it usable. OpenAI's InstructGPT paper, published in January 2022, had shown that RLHF-trained outputs were preferred over raw GPT-3 outputs by human evaluators even when the RLHF model had 100 times fewer parameters. ChatGPT was the consumer product of that finding.

Google had a comparable model. Google had the research. What Google didn't have was the willingness to ship an imperfect product to a hundred million people and iterate in public.

IMAGE PROMPT: Screenshot of the original ChatGPT interface (November 2022 era) — the simple white chat box, example prompts, and "Powered by GPT-3.5" footer. Alternatively: AI-generated image evoking mass consumer adoption — a wave of people at screens, or a graph line going vertical. Avoid anything corporate-looking.

February 7-8, 2023: the most important 48 hours in search history.

In January 2023, Microsoft announced a $10 billion multi-year investment in OpenAI. The strategic intent was clear: integrate GPT-4 into Bing and make the case that Microsoft could win in search.

On February 7, 2023, Microsoft launched the new Bing: GPT-4 integrated directly into Bing Search and the Edge browser. For the first time, you could have a conversation with a search engine.

Reports circulated that Google had accelerated its Bard announcement specifically to preempt Microsoft's event.

On February 8, 2023 (one day after the new Bing) Google announced Bard, its own conversational AI product, powered by a lightweight version of LaMDA. During the promotional demonstration, Bard was asked about new discoveries from the James Webb Space Telescope. Bard stated that the telescope had taken the first pictures of a planet outside our solar system. This was factually incorrect: that distinction belongs to the Very Large Telescope, years earlier. The error was noticed immediately. By market close, Alphabet had lost approximately $100 billion in market capitalisation.

The company that invented the Transformer, that had built LaMDA years before ChatGPT existed, that had deployed BERT into Search in 2019: it had been beaten to market by a company using its own technology, and had panicked publicly in the process.

IMAGE PROMPT: Screenshot of the Alphabet / Google stock chart showing the sharp single-day drop on 8 Feb 2023, or a screenshot of news headlines from that day. Could also be the Bard promotional image used in the failed demo. Crop to show the date and the drop clearly.

The response: Gemini and the ground-up rebuild.

Google reorganised Google Brain and DeepMind into Google DeepMind in 2023, consolidating all AI research under one structure. The goal was to build a new foundation model that could compete not just with GPT-4 but with whatever came next.

Gemini 1.0 launched in December 2023. Where previous Google models had been built primarily on text and had vision capabilities added afterwards, Gemini was natively multimodal, trained simultaneously on text, image, audio, video, and code from the ground up. Three tiers: Ultra (complex reasoning), Pro (broad deployment), Nano (on-device).

Gemini Ultra became the first model to achieve human-expert performance on MMLU, a benchmark covering 57 academic subjects, and reached state of the art on 30 of 32 benchmarks tested at launch.

The ground-up rebuild had worked. Google had a genuinely frontier model. The question now was whether it could translate technical capability into product dominance. As the preceding two years had demonstrated, those aren't the same thing.

IMAGE PROMPT: Screenshot of the Gemini benchmark results table from the December 2023 launch — the comparison grid showing Gemini Ultra vs GPT-4 vs other models across MMLU, HumanEval, and reasoning tasks. Source: Google DeepMind blog or the Gemini 1.0 technical report.

Anthropic and Claude: a third path.

While Google rebuilt and OpenAI shipped, a third player entered with a different philosophy. Anthropic was founded in 2021 by Dario Amodei, Daniela Amodei, and other former OpenAI researchers, following directional disagreements about the pace of AI development and the sufficiency of RLHF as a safety approach.

Anthropic's response to the alignment problem was Constitutional AI: rather than relying purely on human raters, the model is trained against an explicit set of principles (a "constitution") and learns to self-critique and revise its own outputs accordingly. This is a normative approach to alignment (what should the model do?) rather than the empirical approach of RLHF (what do humans prefer?).

Claude launched publicly in March 2023. The name, the caution, the explicit safety focus: all of it reflected a different calculation about what mattered in deploying AI systems at scale.

What this history means for AI search.

The arc of the story is clear in retrospect: the technical foundations were built at Google, the commercial urgency came from OpenAI, the competitive pressure came from Microsoft, and the safety philosophy came from Anthropic's dissent.

Every AI search product you're dealing with today (Google's AI Overviews, ChatGPT Search, Microsoft Copilot, Perplexity, Claude) is a product of this history. The Transformer architecture explains their shared mechanics. The race of 2022-2023 explains why they all arrived at roughly the same time. And the different alignment approaches (RLHF vs Constitutional AI) explain why they behave differently when you use them.

If there's one thing to take from this history, it's that the platforms you're optimising for today weren't designed with your content in mind. They were built to win a competitive race between trillion-dollar companies. Understanding the technology they're built on is how you stay ahead of the next shift.

/ Next in the series

Article 2 covers how LLMs actually generate responses, what a context window is, and the "lost in the middle" problem that determines whether AI systems use your content or ignore it. Read Article 2 →

SOURCES ↓

Vaswani et al., "Attention Is All You Need," Google Brain / Google Research, 2017 — research.google/pubs/attention-is-all-you-need
Google Research Blog, "Transformer: A Novel Neural Network Architecture for Language Understanding," 2017 — research.google/blog/transformer-a-novel-neural-network-architecture-for-language-understanding
Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," 2018 — research.google/pubs/bert-pre-training-of-deep-bidirectional-transformers-for-language-understanding
Google Research Blog, "Open Sourcing BERT," 2018 — research.google/blog/open-sourcing-bert-state-of-the-art-pre-training-for-natural-language-processing
Thoppilan et al., "LaMDA: Language Models for Dialog Applications," Google, 2022 — research.google/pubs/lamda-language-models-for-dialog-applications
Google Research Blog, "LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models," 2022 — research.google/blog/lamda-towards-safe-grounded-and-high-quality-dialog-models-for-everything
Kaplan et al., "Scaling Laws for Neural Language Models," OpenAI, 2020 — openai.com/index/scaling-laws-for-neural-language-models
Ouyang et al., "Training Language Models to Follow Instructions with Human Feedback (InstructGPT)," OpenAI, 2022 — openai.com/index/instruction-following
OpenAI, "Introducing ChatGPT," 2022 — openai.com/index/chatgpt
Microsoft, "Reinventing Search with a New AI-Powered Microsoft Bing and Edge," 2023 — blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web
Anthropic, "Introducing Claude," 2023 — anthropic.com/news/introducing-claude
Anthropic, "Claude's Constitution," 2022 — anthropic.com/news/claudes-constitution
Wikipedia: Attention Is All You Need — en.wikipedia.org/wiki/Attention_Is_All_You_Need
Wikipedia: BERT (language model) — en.wikipedia.org/wiki/BERT_(language_model)
Wikipedia: LaMDA — en.wikipedia.org/wiki/LaMDA
Wikipedia: ChatGPT — en.wikipedia.org/wiki/ChatGPT
Wikipedia: Google Gemini — en.wikipedia.org/wiki/Google_Gemini
Wikipedia: OpenAI — en.wikipedia.org/wiki/OpenAI
Wikipedia: Anthropic — en.wikipedia.org/wiki/Anthropic
Wikipedia: Microsoft Copilot — en.wikipedia.org/wiki/Microsoft_Copilot
Wikipedia: Google Brain — en.wikipedia.org/wiki/Google_Brain
Wikipedia: Reinforcement Learning from Human Feedback — en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback

Attention Is All You Need: The Google Paper That Accidentally Ended Google's Search Monopoly.

The world before Transformers.

The Transformer: what changed and why.

BERT (2018): the Transformer enters Google Search.

Scaling Laws (2020): the scientific case for the race.

LaMDA (2021-22): the model Google wouldn't ship.

November 30, 2022: ChatGPT.

February 7-8, 2023: the most important 48 hours in search history.

The response: Gemini and the ground-up rebuild.

Anthropic and Claude: a third path.

What this history means for AI search.

The full series.

Thomas Cox

Want this in your inbox?