Source-worthiness, explained for SEO teams

Most B2B SEO teams are doing the wrong thing when they try to improve their AI search visibility. They're making their own content better (more structured, better formatted, richer schema) and finding that it doesn't move the citation rate much. The reason is that they're optimising the wrong surface.

LLMs don't primarily cite a brand because that brand's website is well-organised. They cite a brand because other sources (sources they weight as reliable) have already said something coherent and repeatable about it. Your own site confirms. It rarely establishes.

What source-worthiness actually means.

Source-worthiness is the property of being the kind of source a model is likely to draw from when constructing an answer. It is distinct from domain authority (a link-graph metric), from content quality (a relevance metric), and from E-E-A-T (an editorial signal). It is the combination of: being present on surfaces models weight highly, saying consistent and specific things on those surfaces, and being corroborated by other sources that are themselves source-worthy.

The distinction matters because the interventions are different. Improving domain authority means acquiring links. Improving content quality means editing your pages. Improving source-worthiness means changing where your brand appears and what it says when it appears there. Mostly off your own site.

For a concrete introduction to how this fits into a broader GEO strategy, the post on the B2B GEO roadmap covers the high-level pattern. This post goes deeper on the off-site layer specifically.

The five source categories.

Across the audits I've run, brand citations correlate most strongly with presence across five distinct source categories. The five are not equally weighted by all models, and the weighting shifts depending on the query type, but the brands that get cited consistently tend to have a meaningful presence in all five.

1. Reference sources

Wikipedia, Wikidata, and to a lesser extent Crunchbase. These are the surfaces where models look first when constructing an entity profile. A Wikipedia page that clearly describes your category, your value proposition, and your customer type (and links to credible sources) is the highest-leverage single asset in GEO. Most B2B SaaS companies either don't have one or have one that contradicts their current positioning. Wikidata is the structured-data layer underneath Wikipedia: founder, founding date, industry classification, headquarters, official URL. It is editable, often incomplete, and widely ignored by marketing teams.

2. Editorial sources

Coverage in publications with editorial independence and a track record of accuracy: TechCrunch, The Information, Stratechery, vertical trade publications in your specific category. The key word is editorial. Paid placements and syndicated press releases do not behave the same way as genuinely earned coverage, and there is evidence models have learned to discount the former. A single feature in a credible trade publication where a journalist has independently characterised your company is worth more to your citation profile than twenty syndicated announcements.

3. Community sources

Hacker News threads, Reddit discussions (particularly r/devtools, r/sysadmin, category-specific subreddits), and specialist communities where your brand name appears in the body of organic posts and discussions, not in ads, not in sponsored content. Community mentions are a signal that real practitioners know about and have opinions about your product. Models weight this because community sources are hard to manufacture at scale and represent genuine market knowledge.

4. Document sources

Public transcripts of conference talks, webinar recordings indexed by the open web, analyst reports that name you, academic or research papers that reference your work, and SEC filings if you're publicly traded. These tend to be longer documents with specific, attributable claims. Models draw from them for factual detail (dates, numbers, direct quotes from executives) in a way they don't draw from blog posts.

5. Owned sources

Your own site, specifically the pages that behave like external sources: pages with named human authors, clear publication dates, structured data markup, and content that makes specific and verifiable claims. A generic "solutions" page is not an owned source in this sense. A published research report with a named author, a methodology section, and a date is. The distinction is whether a model would treat the page as a citable source or as marketing copy.

/ The pattern

Brands cited consistently across answer engines tend to have strong presence in at least four of these five categories. The most common gap is reference sources (specifically Wikidata), followed by community. Most B2B SaaS brands over-index on owned and under-invest in everything else.

How to audit your own source-worthiness.

A basic source-worthiness audit takes two to three hours and gives you a clear picture of where your gaps are. Here is the process I use:

Reference check. Search your brand name on Wikipedia. Does a page exist? Does it describe your current product and positioning accurately? Check Wikidata: is your entry complete (industry, founder, founding date, headquarters, website)? Are the descriptions in both consistent with how you describe yourself on your homepage?
Editorial check. Pull every editorial mention of your brand in the last 24 months from credible trade and tech publications. Not press releases. Not guest posts on low-authority sites. Genuine editorial coverage. How many pieces? What category do they put you in? Is that consistent with how you describe yourself?
Community check. Search Hacker News and relevant subreddits for your brand name. Are there organic discussions? What do practitioners say? This is also useful competitive intelligence. If your competitors have active HN threads and you don't, that's a gap.
Document check. Are there publicly accessible transcripts, research papers, or analyst notes that name your company? Conference talks from your team with transcript indexing?
Owned source check. Of your best-traffic content pages: how many have named human authors? How many have structured Article schema with a Person author block? How many make specific, verifiable claims vs. general value proposition statements?

Score each category 0–2: 0 = absent or actively contradictory, 1 = present but thin or inconsistent, 2 = strong and coherent. A total score of 8–10 is the baseline for consistent AI citations. Below 6 and you have structural gaps that no amount of owned-content optimisation will compensate for.

For a worked example of how this audit feeds into a broader entity coherence review, the post on entity coverage maps as a working template picks up where this one leaves off.

The off-site work that compounds.

Building source-worthiness is slower than optimising a page. It requires genuine investment in editorial relationships, in community presence, and in the kind of original thinking that gets picked up by journalists and practitioners organically. There is no shortcut that consistently works. Paid placements don't behave the same as earned coverage. Manufactured community presence gets filtered. The only durable path is doing work that is genuinely worth citing.

What that means in practice: original research with a clear methodology, published under a named author. Executive commentary in the trade press that takes a specific, non-obvious position. Genuine participation in practitioner communities, not "content seeding" but actual answers to actual questions. These are the inputs that build source-worthiness over 12–18 months. They are also, not coincidentally, the inputs that build a durable brand independent of any single search channel.

The teams that will have strong AI search visibility in 2027 are the ones doing this work now, when most of their competitors are still asking whether GEO is worth the investment. The answer to that question is already obvious. The more interesting question is what specifically to build first.

/ Source-worthiness checklist

Wikipedia page: accurate, current, linked · Wikidata entry: complete, consistent · Editorial coverage: earned, categorised correctly · Community presence: organic, practitioner-level · Named-author content: schema-marked, specific claims. Score each 0–2. Target: 8+.

NOTES

The five-category framework is my own working model, not a published standard. It derives from pattern-matching across multiple GEO audits rather than from any single controlled study.
"Behave like external sources" is deliberately loose language. The underlying mechanism (why models weight certain pages differently) involves training data composition, reinforcement signals, and retrieval architecture. The practical implication (named authors, specific claims, structured markup) holds regardless of the mechanism.

/ Frequently asked

Does this apply to companies that aren't yet well-known?

Yes, and it's arguably more urgent for them. A brand without established reference-source coverage is effectively invisible to models constructing category overviews. The first priority for an early-stage B2B company is getting a Wikidata entry and a Wikipedia stub, which is achievable even with limited press coverage.

Is guest posting on third-party sites worth doing?

It depends entirely on the site. A genuine byline piece in a respected trade publication (where the editorial team exercised independent judgment about whether to publish it) contributes to editorial source-worthiness. A guest post on a low-authority blog that accepts anything is not editorial coverage and is likely weighted accordingly. The distinction is real editorial independence, not just off-site placement.

How quickly does improving source-worthiness affect citation rates?

Reference source fixes (Wikidata, Wikipedia) can show effects within weeks as models update their entity profiles. Editorial and community presence builds over months, not weeks. Expect a 3–6 month lag between systematic off-site investment and measurable improvement in citation rates across the main answer engines.

Does this work the same across all AI engines?

The weighting differs. Perplexity is more real-time and weights recent editorial coverage more heavily. ChatGPT and Claude draw more from their training data, which means Wikipedia and Wikidata matter more. Google AI Overviews adds its own web index into the mix. The five-category framework is useful across all of them because it builds presence on the surfaces each engine draws from, at different weights.

Source-worthiness, explained for SEO teams.

What source-worthiness actually means.

The five source categories.

1. Reference sources

2. Editorial sources

3. Community sources

4. Document sources

5. Owned sources

How to audit your own source-worthiness.

The off-site work that compounds.

NOTES

/ Frequently asked

Thomas Cox

More from the notebook.

Entity coverage maps: a working template.

Citation share is the new keyword ranking.

The B2B GEO roadmap I'd build if I started today.

Want this in your inbox?