GEO in 2026: how AI engines decide what to cite, and how to win

Generative Engine Optimization is the new discipline of getting cited by ChatGPT, Claude, Perplexity, and Gemini. Here is what we have learned from running it in production for clients.

Generative Engine Optimization (GEO) is the discipline of getting your content cited by AI engines — ChatGPT, Claude, Perplexity, Gemini, You.com — when they answer a user’s question. In 2026 it captures a meaningful share of high-intent commercial queries that used to land on Google’s blue links. Many B2B sales pipelines now have AI-engine citations as a top-five inbound channel.

This post is a working summary of what we have learned running GEO programs for clients, including the technical patterns, the content patterns, and the false promises.

How AI engines pick citations

In broad strokes, the four major engines use similar machinery:

  1. Retrieval — when a user query comes in, the engine runs a search against a corpus (often Bing or Google’s web index, sometimes their own crawl, sometimes a third-party retrieval API like You.com or Perplexity’s). The top ~20-50 results are pulled.
  2. Re-ranking — the candidates are scored by an embedding-based model for relevance to the user query. Roughly the top 5-15 survive.
  3. Generation with citations — the LLM is given the retrieved context and asked to answer the user’s question while citing the sources it used. The model picks which sources to cite based on which ones it actually drew language from.
  4. Display — the final answer surfaces source citations as inline footnotes or a “Sources” list.

The leverage points are at steps 1-2 (your content has to be retrieved and re-ranked highly) and step 3 (your content has to be the kind the model picks to cite).

What gets retrieved and re-ranked

The retrieval/re-rank step is mostly traditional SEO with embedding-based modernisations:

  • Authority signals: backlinks, domain age, brand mentions, schema.org markup. The same things that worked for Google in 2020 still matter
  • Topical depth: does your site have multiple pieces of content on the topic, or just one shallow page
  • Freshness: dateModified, dateModified, dateModified. AI engines aggressively prefer recent content for time-sensitive queries
  • Semantic match quality: not just keyword presence but whether the document’s embeddings sit near the query’s embedding in vector space. Synonyms and concept-coverage matter more than exact phrases

Most established SEO playbooks transfer here. If you rank well on Google, you will mostly rank well in AI-engine retrieval.

What gets cited (the GEO-specific layer)

This is where GEO diverges from SEO. Among the documents that get retrieved, which ones get cited in the final answer? In our analysis of 200+ Perplexity and ChatGPT answers across queries our clients care about:

The most-cited formats:

  1. FAQ blocks — direct question/answer pairs. AI engines extract these almost verbatim. This is why we ship FAQ JSON-LD on every page that has one
  2. Comparison tables — “X vs Y vs Z” with explicit columns and rows. Trivial for an AI engine to convert into a comparison answer
  3. Lists of named entities — “the 5 best providers of X in MENA” with specific names. Easy to cite
  4. Definitions — “What is high-risk payment processing?” followed by a clear 2-3 sentence answer
  5. Numbered procedures — “How to choose between Claude and GPT” with explicit steps

The least-cited formats:

  • Long narrative paragraphs without clear extraction points
  • Marketing-speak (“we deliver world-class outcomes”) — AI engines down-weight promotional language
  • Content behind paywalls or login walls (obviously)
  • Content with no citation-worthy claims (vague, opinion-light, no specific numbers)

Schema.org markup that actually moves the needle

In rough order of impact on GEO citations:

SchemaWhy it mattersWhere to use
FAQPageMost-cited format; structures Q&A clearlyEvery page with a FAQ section
Article / BlogPostingIdentifies citable long-form contentEvery blog post and article
ServiceIdentifies what your business does at entity levelEvery service page
Organization + LocalBusinessEntity graph anchorOnce per site, on every page
PersonE-E-A-T author attributionAuthor bylines, team pages
BreadcrumbListHelps engines understand site structureEvery non-home page
Review + aggregateRatingTrust signal; cited for “best X” queriesWhere you have authentic reviews
HowToCaptures procedure contentTutorial / methodology pages
SpeakableSpecificationIndicates voice-readable contentHero paragraphs, FAQ answers

Schema is necessary but not sufficient. The visible content has to match. AI engines explicitly cross-check schema against visible HTML; bait-and-switch (schema says one thing, page says another) gets penalised.

The llms.txt standard

The llmstxt.org proposal is the de-facto 2025/2026 standard for guiding AI crawlers to a site’s most important resources in a token-efficient form. It is a markdown file at the root of your domain that lists key URLs with brief descriptions. ChatGPT, Claude, and Perplexity respect it for citation prioritisation.

Two files matter:

  • llms.txt — a short index (typically 100-300 lines) listing canonical resources by section
  • llms-full.txt — full plaintext dump of all visible site content concatenated for efficient AI ingestion

Both should be reachable at the root domain. The cost is minimal; the benefit is meaningful. Every serious B2B site should ship them.

What does NOT work

Things we have tried that did not deliver:

  • Keyword stuffing in invisible content — AI engines fingerprint and discount this. It is also a Google penalty
  • Cloaking (showing different content to crawlers) — AI engines crawl with multiple agents and cross-check. Cloaking gets you de-indexed
  • Sponsored content disguised as editorial — both Google and AI engines have become good at detecting it. The penalty is severe and persistent
  • Excessive listicles (“Top 50 of X”) — the diminishing-returns curve flattens fast. Better to have 5 deeply substantive pieces than 50 shallow ones
  • Pure AI-generated content with no editorial signature — AI engines specifically down-weight content that smells AI-written and unedited. Substance, specificity, named-entity references, and writer-voice all signal “this came from a human who knows something”

Building a GEO program

A pragmatic 90-day program for a B2B site:

Weeks 1-2: Audit

  • Crawl your site, identify pages that should rank for your target queries
  • Run those queries through ChatGPT, Claude, and Perplexity manually; record which pages currently get cited (if any)
  • Audit schema.org markup against what we listed above
  • Check whether robots.txt allows GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and others

Weeks 3-6: Foundation

  • Ship robots.txt explicitly allowing major AI crawlers
  • Ship llms.txt and llms-full.txt
  • Add FAQPage schema to every page with a FAQ
  • Add Service and BreadcrumbList schema where applicable
  • Fix any schema-vs-visible-content mismatches

Weeks 7-10: Content

  • For each high-value query, ensure you have one substantive page (1500+ words) that answers it
  • Add comparison tables, named-entity lists, and clear definitions to existing pages
  • Expand thin pages; cull or merge near-duplicate pages

Weeks 11-12: Measurement

  • Re-run the manual citation check from week 2
  • Set up a monthly tracker (a simple spreadsheet noting which pages get cited for which queries)
  • Plan the next 90-day cycle based on the gaps

What you cannot control

Two structural realities to absorb:

  1. You will not get citations on every query. AI engines try to cite from diverse sources; if a competitor’s content is comparable and they got there first, they keep the citation. The work is incremental
  2. The citation surface is shifting fast. Today’s “ChatGPT cites you” can become “Gemini cites you” or “Perplexity replaces ChatGPT in your buyer’s habit.” Build for the discipline, not for one specific engine

We expect GEO to consolidate over the next 12-18 months as the engines’ citation algorithms converge. The fundamentals — substantive content, clean schema, authoritative entity signals — work across all of them.

Get in touch

If you would like us to audit your site for GEO readiness and design the program, contact us at contact@kalastor.net. Typical engagement: 90 days to a measurable lift in AI-engine citation rate.

Adjacent reading: Claude vs GPT vs Gemini vs Mistral comparison, State of AI adoption in Egyptian enterprises.