Four LLMs, Four Different Internets: A 700-Citation Audit of How AI Actually Ranks Websites
We sampled ChatGPT, Gemini, Grok, and Perplexity on the same 5 prompts × 4 runs and pulled 700 cited URLs. The intersection of all four top-10 lists is exactly 1 domain. Half the brands they recommend, they don't cite. Single-model GEO leaves 75% of the surface area on the table.
By AIAttention Research
Quick answer: We sampled the same 5 prompts in the "AI brand visibility tooling" category through ChatGPT, Gemini, Grok, and Perplexity — four times each over 18 hours, 700 cited URLs total. The intersection of all four top-10 source lists is exactly one domain (visible.seranking.com). Gemini's citations change ~90% between identical runs. Half of the brands LLMs recommend (33-47%) aren't cited at all — they're pulled from training data alone. If you're optimizing for one LLM, you're invisible across 75% of the AI surface area. Below is the data and what to do with it.
What We Sampled
For one of the projects monitored on AIAttention — tracking the "AI brand visibility tooling" category — we ran 5 prompts × 4 LLMs × 4 separate runs over 18 hours (2026-04-29 to 2026-04-30 UTC).
The prompts vary the category question without changing intent:
- Which platforms help me track my brand mentions in ChatGPT, Gemini, and Perplexity?
- Which AI visibility analytics tools offer competitor detection?
- What are the best tools to monitor AI visibility for brands in 2026?
- Best GEO (Generative Engine Optimization) analytics platforms?
- What are the best AEO (Answer Engine Optimization) analytics tools?
The four LLM endpoints are the public web interfaces (chatgpt.com, gemini.google.com, grok.com, perplexity.ai) — not the API. Web UIs use grounding by default; APIs typically don't. Real users interact with the web UI, so that's what we measure.
Total: 69 successful (model × prompt × run) records, 700 cited URLs.
Finding 1: Gemini Is the Most Unstable
We measured Jaccard similarity of cited-domain sets across paired runs of the same prompt. Jaccard 0 = totally disjoint, 1 = identical:
| Model | Avg Jaccard | What it means |
|---|---|---|
| grok-web | 0.49 | Most stable (small sample, n=4) |
| chatgpt-web | 0.32 | About 1/3 overlap between runs |
| perplexity-web | 0.32 | Same as ChatGPT |
| gemini-web | 0.10 | Same prompt, mostly different sources next time |
Jaccard 0.10 means 90% of cited domains change between runs of the same prompt. Gemini's read on a given prompt at any moment is genuinely unstable.
Practical implication: a single Gemini measurement is uninformative. If you check your category once a week and Gemini didn't cite you, that tells you almost nothing — re-run an hour later and the cited set will be mostly different. To get a stable picture of Gemini, sample 3-4 times per prompt, then aggregate.
Finding 2: Each LLM Cites a Different Internet
The top 10 most-cited domains per LLM tell four very different stories.
ChatGPT — tier-1 publications + recognizable SaaS blogs:
| Cites | Domain | Type |
|---|---|---|
| 12 | techradar.com | Tier-1 tech publication |
| 11 | frase.io | SEO/content tool blog |
| 9 | visible.seranking.com | SEO suite |
| 7 | llmclicks.ai | Direct competitor |
| 6 | riffanalytics.ai | Niche AEO blog |
| 5 | otterly.ai | Direct competitor |
| 4 | sitepoint.com | Dev publication |
Gemini — SEO agencies and indie creators:
| Cites | Domain | Type |
|---|---|---|
| 18 | siftly.ai | Direct competitor |
| 15 | nightwatch.io | SEO agency blog |
| 10 | kime.ai | AI niche tool |
| 8 | digitalapplied.com | Agency blog |
| 5 | nicklafferty.com | Personal blog |
| 5 | genixly.io | AI tool |
Grok — universal sources + topic-blind weirdness:
| Cites | Domain | Type |
|---|---|---|
| 9 | cookiepedia.co.uk | Cookie compliance |
| 9 | onetrust.com | Cookie compliance |
| 8 | visible.seranking.com | Universal |
| 6 | reddit.com | Universal |
| 4 | ziptie.dev | Dev blog |
Cookiepedia and OneTrust as the top 2 cited sources for "best AI brand monitoring tool" prompts is off-topic. Our read: Grok's web search appears to grab pages it visited during the search session — including cookie consent banners — and treat them as content. Grok's grounding has the lowest signal-to-noise ratio of the four.
Perplexity — competitor product pages, directly:
| Cites | Domain | Type |
|---|---|---|
| 12 | visible.seranking.com | Universal |
| 7 | reddit.com | Universal |
| 7 | llmclicks.ai | Competitor |
| 7 | nicklafferty.com | Indie blog |
| 6 | evertune.ai | Competitor |
| 6 | superlines.io | Competitor |
| 5 | aiclicks.io | Competitor |
Perplexity disproportionately cites the homepages of competitors directly. Useful as competitive intel: Perplexity surfaces whoever has discoverable, well-structured product pages.
The intersection of all four top-10 lists is exactly one domain — visible.seranking.com. For top-10 territory, each LLM is reading a fundamentally different internet.
Finding 3: Ten Domains Every LLM Cites
Drop the "top 10" filter and look at the long tail. Exactly 10 domains were cited by all four LLMs in the sample:
| Domain | Type |
|---|---|
| reddit.com | Forum |
| visible.seranking.com | SEO suite |
| amplitude.com | Analytics |
| conductor.com | Enterprise SEO |
| zapier.com | Automation |
| siftly.ai | Direct competitor |
| scrunch.com | Direct competitor |
| aiclicks.io | Direct competitor |
| bluefishai.com | AI niche |
| ziptie.dev | Dev blog |
A single content placement on any one of these reaches all 4 LLMs simultaneously. If you're bandwidth-constrained on GEO outreach, this is the list to attack first.
Reddit alone produced 21 distinct citation events across the 80-attempt sample — roughly 25% of all unique citations come from one domain. For most categories, Reddit is the highest-leverage single channel for AI visibility right now.
Finding 4: Half of Recommended Brands Aren't Cited
Two distinct events happen in any LLM answer:
- Mention — the brand name appears in the recommendation text
- Citation — the brand's domain appears in the cited sources
These can occur independently. We measured the conditional probabilities for 25+ named brands.
P(brand cited | brand mentioned) — if the LLM recommended the brand, did it also cite the brand's site?
| Model | P(cited | mentioned) |
|---|---|
| chatgpt-web | 47% |
| perplexity-web | 40% |
| grok-web | 37% |
| gemini-web | 33% |
Across all four models, roughly half of recommended brands are not cited. The LLM remembered the brand from training data and recommended it without grounding.
The most striking single example: a brand named Peec was mentioned 8 times by ChatGPT, 7 times by Gemini, 9 times by Grok, 9 times by Perplexity — across the same 5 prompts. Citations of peec.com? Zero, across all four LLMs.
Peec wasn't outreaching. Peec wasn't getting backlinks. The LLMs just know Peec exists in this category and recommend it accordingly.
This is the GEO ideal: brand recall without citation dependence. And it implies a budget rule most teams haven't internalized: long-running brand presence in LLM training data carries as much weight for recommendations as fresh outreach. Wikipedia entries, Crunchbase profiles, Hacker News threads from 2023-2024, conference talk recordings — these don't show up as citations but they materially shape recommendations.
If 100% of your GEO budget goes to fresh outreach, you're missing the half of recommendation events that come from training-data presence alone.
Finding 5: Perplexity Cites Broadly, Recommends Narrowly
The inverse direction — if a domain is cited, is the brand also recommended? — splits the four LLMs into two camps:
| Model | P(mentioned | cited) |
|---|---|
| grok-web | 82% |
| chatgpt-web | 75% |
| gemini-web | 63% |
| perplexity-web | 45% |
For Grok and ChatGPT, citing ≈ endorsing. For Perplexity, citation is more neutral — it cites broadly to support specific sentences, but only recommends a narrow subset of those sources as actual brand endorsements.
A single Perplexity citation is a much weaker brand signal than a single ChatGPT citation. To move Perplexity's recommendation needle, you need citations plus explicit "best in category" positioning in the cited content.
What This Means for Your GEO Strategy
The headline framing: GEO isn't one strategy, it's four. The path to "get recommended by an LLM" differs sharply by LLM:
| Goal | Strategy |
|---|---|
| Get ChatGPT to recommend you | Get cited. Tier-1 publications and SaaS company blogs. Citation ≈ endorsement. |
| Get Gemini to recommend you | Pitch indie creators and SEO agency blogs. Sample 3-4× because Gemini's variance is high. |
| Get Grok to recommend you | Get cited (broadly — Grok's signal is noisy but generous). Citation ≈ endorsement. |
| Get Perplexity to recommend you | Get cited and be positioned as "best in category" in the cited content. Make sure your own product page is canonical and discoverable. |
| Stop chasing the citation race | Build durable training-data presence (Wikipedia, Crunchbase, Hacker News, talks). Aim for the Peec pattern. |
A defensible GEO budget allocation, given the data:
- 30-50% to Dimension 2 — durable training-data presence (Wikipedia, Crunchbase, Hacker News, Substack archives, conference talks).
- 50-70% to Dimension 1 — citations from current outreach, content, backlinks for grounded queries.
If you're spending 100% on Dimension 1 — which is most teams — you're missing the half of recommendation events that come from training-data presence alone.
Methodology
- Data collection window: 2026-04-29 12:53 UTC → 2026-04-30 07:27 UTC (4 sampling runs over 18 hours)
- Sample: 1 project, 5 prompts × 4 LLMs × 4 runs = 69 successful records (out of 80 attempted), 700 cited URLs total
- LLM endpoints: ChatGPT (web, GPT-5.4), Gemini (web, Gemini 3 Flash), Grok (web), Perplexity (web, Sonar)
- Sampling mechanism: headed Playwright per model, fresh ephemeral userDataDir per request — no API fallback, no logged-in session contamination
- Citation extraction: model-specific DOM scrapers run server-side, citations parsed from the cited-sources panel each LLM exposes
- Aggregation: post-hoc Python script over the raw S3 responses
Limitations:
- Single category, single project. Universal-N domain list and per-LLM personalities likely differ across verticals.
- Small Grok sample (n=9 vs 20 for others). Grok scraper had reliability issues during the window.
- Citation extraction depends on DOM scrapers that move with each LLM's UI changes.
- No domain-authority weighting in this analysis — a 4-of-4 cite from a small blog is genuinely lower-leverage than from a tier-1 publication. Future passes should weight by DR.
Key Takeaways
- The intersection of all four top-10 source lists is one domain. Each LLM reads a fundamentally different internet for the same query.
- Gemini's citations are nearly random run-to-run (Jaccard 0.10). Single-sample Gemini data is uninformative.
- Ten universal domains are cited by all 4 LLMs. Single placement on any one reaches all 4 LLMs simultaneously. Reddit is responsible for ~25% of all unique citations.
- Half of recommended brands aren't cited (33-47% across models). Training-data presence does as much work as fresh outreach.
- Perplexity cites broadly but recommends narrowly (45% P-recommended-given-cited). ChatGPT and Grok cite-as-endorsement (75-82%). Single Perplexity citation is a weaker brand signal.
Track Your Own Per-LLM Citation Patterns
If you want this kind of breakdown for your category — which domains every LLM cites, where competitors are cited but you aren't, the per-LLM strategy that fits your brand — start a free AIAttention project. Free tier covers 1 active project, 5 prompts, 2 models per project, weekly sampling.
For full per-LLM citation tracking with daily sampling, competitor detection, and the kind of universal-N list this post is built on, see the Pro tier ($79/mo).
Free GEO audit available for verticals we haven't measured yet — email yibo@aiattention.ai with your category and key competitors, and we'll run 5 prompts × 4 LLMs × 4 runs and walk you through the result. No charge — we want the data.
Related reading:
- ChatGPT Loves Brian Dean. Gemini Loves Lily Ray. — the per-model split for AI SEO creators (34 creators, 1,678 runs)
- We Published 3 Data Blog Posts. 6 Hours Later Perplexity Started Citing Us. — what RAG-friendly content actually looks like in production
- How AI Attention Measures Brand Visibility — our scoring methodology
Data from AIAttention production Postgres + S3 raw responses, project bd22453a. Citation counts will evolve as AI models update their grounding pipelines — we publish refresh cycles when the data changes meaningfully.
Start measuring your AI visibility today. Get Started Free →