Blog/Research Report
Research Report2026-04-308 min read

Four LLMs, Four Different Internets: A 700-Citation Audit of How AI Actually Ranks Websites

We sampled ChatGPT, Gemini, Grok, and Perplexity on the same 5 prompts × 4 runs and pulled 700 cited URLs. The intersection of all four top-10 lists is exactly 1 domain. Half the brands they recommend, they don't cite. Single-model GEO leaves 75% of the surface area on the table.

By AIAttention Research

Quick answer: We sampled the same 5 prompts in the "AI brand visibility tooling" category through ChatGPT, Gemini, Grok, and Perplexity — four times each over 18 hours, 700 cited URLs total. The intersection of all four top-10 source lists is exactly one domain (visible.seranking.com). Gemini's citations change ~90% between identical runs. Half of the brands LLMs recommend (33-47%) aren't cited at all — they're pulled from training data alone. If you're optimizing for one LLM, you're invisible across 75% of the AI surface area. Below is the data and what to do with it.

What We Sampled

For one of the projects monitored on AIAttention — tracking the "AI brand visibility tooling" category — we ran 5 prompts × 4 LLMs × 4 separate runs over 18 hours (2026-04-29 to 2026-04-30 UTC).

The prompts vary the category question without changing intent:

  • Which platforms help me track my brand mentions in ChatGPT, Gemini, and Perplexity?
  • Which AI visibility analytics tools offer competitor detection?
  • What are the best tools to monitor AI visibility for brands in 2026?
  • Best GEO (Generative Engine Optimization) analytics platforms?
  • What are the best AEO (Answer Engine Optimization) analytics tools?

The four LLM endpoints are the public web interfaces (chatgpt.com, gemini.google.com, grok.com, perplexity.ai) — not the API. Web UIs use grounding by default; APIs typically don't. Real users interact with the web UI, so that's what we measure.

Total: 69 successful (model × prompt × run) records, 700 cited URLs.

Finding 1: Gemini Is the Most Unstable

We measured Jaccard similarity of cited-domain sets across paired runs of the same prompt. Jaccard 0 = totally disjoint, 1 = identical:

Model Avg Jaccard What it means
grok-web 0.49 Most stable (small sample, n=4)
chatgpt-web 0.32 About 1/3 overlap between runs
perplexity-web 0.32 Same as ChatGPT
gemini-web 0.10 Same prompt, mostly different sources next time

Jaccard 0.10 means 90% of cited domains change between runs of the same prompt. Gemini's read on a given prompt at any moment is genuinely unstable.

Practical implication: a single Gemini measurement is uninformative. If you check your category once a week and Gemini didn't cite you, that tells you almost nothing — re-run an hour later and the cited set will be mostly different. To get a stable picture of Gemini, sample 3-4 times per prompt, then aggregate.

Finding 2: Each LLM Cites a Different Internet

The top 10 most-cited domains per LLM tell four very different stories.

ChatGPT — tier-1 publications + recognizable SaaS blogs:

Cites Domain Type
12 techradar.com Tier-1 tech publication
11 frase.io SEO/content tool blog
9 visible.seranking.com SEO suite
7 llmclicks.ai Direct competitor
6 riffanalytics.ai Niche AEO blog
5 otterly.ai Direct competitor
4 sitepoint.com Dev publication

Gemini — SEO agencies and indie creators:

Cites Domain Type
18 siftly.ai Direct competitor
15 nightwatch.io SEO agency blog
10 kime.ai AI niche tool
8 digitalapplied.com Agency blog
5 nicklafferty.com Personal blog
5 genixly.io AI tool

Grok — universal sources + topic-blind weirdness:

Cites Domain Type
9 cookiepedia.co.uk Cookie compliance
9 onetrust.com Cookie compliance
8 visible.seranking.com Universal
6 reddit.com Universal
4 ziptie.dev Dev blog

Cookiepedia and OneTrust as the top 2 cited sources for "best AI brand monitoring tool" prompts is off-topic. Our read: Grok's web search appears to grab pages it visited during the search session — including cookie consent banners — and treat them as content. Grok's grounding has the lowest signal-to-noise ratio of the four.

Perplexity — competitor product pages, directly:

Cites Domain Type
12 visible.seranking.com Universal
7 reddit.com Universal
7 llmclicks.ai Competitor
7 nicklafferty.com Indie blog
6 evertune.ai Competitor
6 superlines.io Competitor
5 aiclicks.io Competitor

Perplexity disproportionately cites the homepages of competitors directly. Useful as competitive intel: Perplexity surfaces whoever has discoverable, well-structured product pages.

The intersection of all four top-10 lists is exactly one domain — visible.seranking.com. For top-10 territory, each LLM is reading a fundamentally different internet.

Finding 3: Ten Domains Every LLM Cites

Drop the "top 10" filter and look at the long tail. Exactly 10 domains were cited by all four LLMs in the sample:

Domain Type
reddit.com Forum
visible.seranking.com SEO suite
amplitude.com Analytics
conductor.com Enterprise SEO
zapier.com Automation
siftly.ai Direct competitor
scrunch.com Direct competitor
aiclicks.io Direct competitor
bluefishai.com AI niche
ziptie.dev Dev blog

A single content placement on any one of these reaches all 4 LLMs simultaneously. If you're bandwidth-constrained on GEO outreach, this is the list to attack first.

Reddit alone produced 21 distinct citation events across the 80-attempt sample — roughly 25% of all unique citations come from one domain. For most categories, Reddit is the highest-leverage single channel for AI visibility right now.

Finding 4: Half of Recommended Brands Aren't Cited

Two distinct events happen in any LLM answer:

  • Mention — the brand name appears in the recommendation text
  • Citation — the brand's domain appears in the cited sources

These can occur independently. We measured the conditional probabilities for 25+ named brands.

P(brand cited | brand mentioned) — if the LLM recommended the brand, did it also cite the brand's site?

Model P(cited | mentioned)
chatgpt-web 47%
perplexity-web 40%
grok-web 37%
gemini-web 33%

Across all four models, roughly half of recommended brands are not cited. The LLM remembered the brand from training data and recommended it without grounding.

The most striking single example: a brand named Peec was mentioned 8 times by ChatGPT, 7 times by Gemini, 9 times by Grok, 9 times by Perplexity — across the same 5 prompts. Citations of peec.com? Zero, across all four LLMs.

Peec wasn't outreaching. Peec wasn't getting backlinks. The LLMs just know Peec exists in this category and recommend it accordingly.

This is the GEO ideal: brand recall without citation dependence. And it implies a budget rule most teams haven't internalized: long-running brand presence in LLM training data carries as much weight for recommendations as fresh outreach. Wikipedia entries, Crunchbase profiles, Hacker News threads from 2023-2024, conference talk recordings — these don't show up as citations but they materially shape recommendations.

If 100% of your GEO budget goes to fresh outreach, you're missing the half of recommendation events that come from training-data presence alone.

Finding 5: Perplexity Cites Broadly, Recommends Narrowly

The inverse direction — if a domain is cited, is the brand also recommended? — splits the four LLMs into two camps:

Model P(mentioned | cited)
grok-web 82%
chatgpt-web 75%
gemini-web 63%
perplexity-web 45%

For Grok and ChatGPT, citing ≈ endorsing. For Perplexity, citation is more neutral — it cites broadly to support specific sentences, but only recommends a narrow subset of those sources as actual brand endorsements.

A single Perplexity citation is a much weaker brand signal than a single ChatGPT citation. To move Perplexity's recommendation needle, you need citations plus explicit "best in category" positioning in the cited content.

What This Means for Your GEO Strategy

The headline framing: GEO isn't one strategy, it's four. The path to "get recommended by an LLM" differs sharply by LLM:

Goal Strategy
Get ChatGPT to recommend you Get cited. Tier-1 publications and SaaS company blogs. Citation ≈ endorsement.
Get Gemini to recommend you Pitch indie creators and SEO agency blogs. Sample 3-4× because Gemini's variance is high.
Get Grok to recommend you Get cited (broadly — Grok's signal is noisy but generous). Citation ≈ endorsement.
Get Perplexity to recommend you Get cited and be positioned as "best in category" in the cited content. Make sure your own product page is canonical and discoverable.
Stop chasing the citation race Build durable training-data presence (Wikipedia, Crunchbase, Hacker News, talks). Aim for the Peec pattern.

A defensible GEO budget allocation, given the data:

  • 30-50% to Dimension 2 — durable training-data presence (Wikipedia, Crunchbase, Hacker News, Substack archives, conference talks).
  • 50-70% to Dimension 1 — citations from current outreach, content, backlinks for grounded queries.

If you're spending 100% on Dimension 1 — which is most teams — you're missing the half of recommendation events that come from training-data presence alone.

Methodology

  • Data collection window: 2026-04-29 12:53 UTC → 2026-04-30 07:27 UTC (4 sampling runs over 18 hours)
  • Sample: 1 project, 5 prompts × 4 LLMs × 4 runs = 69 successful records (out of 80 attempted), 700 cited URLs total
  • LLM endpoints: ChatGPT (web, GPT-5.4), Gemini (web, Gemini 3 Flash), Grok (web), Perplexity (web, Sonar)
  • Sampling mechanism: headed Playwright per model, fresh ephemeral userDataDir per request — no API fallback, no logged-in session contamination
  • Citation extraction: model-specific DOM scrapers run server-side, citations parsed from the cited-sources panel each LLM exposes
  • Aggregation: post-hoc Python script over the raw S3 responses

Limitations:

  1. Single category, single project. Universal-N domain list and per-LLM personalities likely differ across verticals.
  2. Small Grok sample (n=9 vs 20 for others). Grok scraper had reliability issues during the window.
  3. Citation extraction depends on DOM scrapers that move with each LLM's UI changes.
  4. No domain-authority weighting in this analysis — a 4-of-4 cite from a small blog is genuinely lower-leverage than from a tier-1 publication. Future passes should weight by DR.

Key Takeaways

  • The intersection of all four top-10 source lists is one domain. Each LLM reads a fundamentally different internet for the same query.
  • Gemini's citations are nearly random run-to-run (Jaccard 0.10). Single-sample Gemini data is uninformative.
  • Ten universal domains are cited by all 4 LLMs. Single placement on any one reaches all 4 LLMs simultaneously. Reddit is responsible for ~25% of all unique citations.
  • Half of recommended brands aren't cited (33-47% across models). Training-data presence does as much work as fresh outreach.
  • Perplexity cites broadly but recommends narrowly (45% P-recommended-given-cited). ChatGPT and Grok cite-as-endorsement (75-82%). Single Perplexity citation is a weaker brand signal.

Track Your Own Per-LLM Citation Patterns

If you want this kind of breakdown for your category — which domains every LLM cites, where competitors are cited but you aren't, the per-LLM strategy that fits your brand — start a free AIAttention project. Free tier covers 1 active project, 5 prompts, 2 models per project, weekly sampling.

For full per-LLM citation tracking with daily sampling, competitor detection, and the kind of universal-N list this post is built on, see the Pro tier ($79/mo).

Free GEO audit available for verticals we haven't measured yet — email yibo@aiattention.ai with your category and key competitors, and we'll run 5 prompts × 4 LLMs × 4 runs and walk you through the result. No charge — we want the data.

Related reading:


Data from AIAttention production Postgres + S3 raw responses, project bd22453a. Citation counts will evolve as AI models update their grounding pipelines — we publish refresh cycles when the data changes meaningfully.

Start measuring your AI visibility today. Get Started Free →

More from the AIAttention Blog