May 11, 2026 16 min read

Schema Markup for AI Search - The Technical Guide to Getting Cited

The schema types ChatGPT, Perplexity, Gemini, and Claude actually use to decide who gets cited - JSON-LD examples, parsing rules, and the eight mistakes we see every audit.

Schema Markup for AI Search - The Technical Guide to Getting Cited

By 10xSearch Editorial

Schema markup is the single highest-use technical asset for AI search citation in 2026. It is the layer that lets ChatGPT, Perplexity, Gemini, and Claude parse a page in milliseconds, extract the entities they need (business, author, product, location, claim), and attribute those entities back to your domain with confidence. Pages with clean, validated JSON-LD get cited at materially higher rates than otherwise identical pages without it - even when content quality, domain authority, and our [llms.txt](https://10xsearch.com/blog/what-is-llms-txt-file-and-why-every-business-needs-one/) coverage are held constant.

This is the technical playbook. Which schema types matter for AI citation, exactly how to write them in JSON-LD, how AI engines parse structured data versus raw HTML, and the eight failure modes we see in every audit we run.

Why AI Engines Care About Schema (More Than Google Ever Did)

Traditional search engines used schema as an optional enrichment layer. Google's Rich Results were a nice-to-have. Most pages ranked perfectly well without `Article` or `FAQPage` markup; schema bought you a star rating in the SERP, a recipe card, or a featured snippet, but it did not move organic ranking in any direct way.

AI search engines are different. When ChatGPT's browse pipeline or Perplexity's retrieval-augmented generation layer fetches your page, it has milliseconds - not minutes - to extract structured meaning. A 4,000-word blog post is a wall of text. The model has to decide which sentences are claims, which are filler, who the author is, what entity the page represents, and whether the page is trustworthy enough to cite by name. Schema markup short-circuits all of that. A correctly-marked page hands the model a parsed JSON object with the answer already extracted.

The mechanical consequence: every major retrieval pipeline we have reverse-engineered - OpenAI's browse mode, Perplexity's sonar, Google Gemini's grounding layer, Anthropic's web search tool - preferentially cites pages with clean schema. The lift is not marginal. In our internal audits across 60+ client domains, pages with valid Article + Organization + FAQPage schema are cited 3 to 5x more often than otherwise comparable pages with no schema or broken schema.

The investment is small. The downside of skipping it is invisible exclusion from a fast-growing channel.

The Eight Schema Types That Matter for AI Citation

We deploy a deliberately narrow set of schema types in production. The full schema.org vocabulary contains more than 800 types, but in practice eight cover 95% of the AI citation surface for the businesses we work with.

1. Organization - The Identity Layer

`Organization` (or its specialized children `LocalBusiness`, `RealEstateAgent`, `LegalService`, `MedicalBusiness`, etc.) is the foundation. Every page on your site should reference a single canonical `Organization` entity, typically via `@id` so the markup forms a connected graph rather than 200 disconnected copies.

Code (json): { "@context": "https://schema.org", "@type": "Organization", "@id": "https://10xsearch.com/#organization", "name": "10xSearch", "url": "https://10xsearch.com", "logo": { "@type": "ImageObject", "url": "https://10xsearch.com/images/logo.png", "width": 512, "height": 512 }, "description": "AI search visibility agency specializing in GEO, AEO, and EEAT for high-stakes B2B and luxury real estate brands.", "founder": { "@type": "Person", "@id": "https://10xsearch.com/#todd-koch", "name": "Todd Koch" }, "sameAs": [ "https://x.com/10xsearch", "https://www.linkedin.com/company/10xsearch", "https://www.youtube.com/@10xsearch" ], "contactPoint": { "@type": "ContactPoint", "telephone": "+1-555-555-5555", "contactType": "customer service", "email": "hello@10xsearch.com", "availableLanguage": ["English"] }, "address": { "@type": "PostalAddress", "addressLocality": "Bend", "addressRegion": "OR", "addressCountry": "US" } }

Three details that matter more than people realize: - `@id` with a URL fragment. This makes the entity referenceable from other markup blocks (Article, Product, Review) without duplicating the data. AI engines walk the graph and consolidate. - `sameAs` populated with verified profiles. This is the entity disambiguation layer. When ChatGPT decides whether "10xSearch" is a real company versus a generic phrase, it looks for cross-platform identity signals. Three to seven verified `sameAs` URLs is the sweet spot. - `founder` or `employee` linking to a Person `@id`. Authorship and authority chain back through this link. We will return to it under EEAT.

2. Article (or BlogPosting, NewsArticle) - The Content Layer

Every editorial page needs `Article` schema. The variant matters less than the field coverage. `Article` is the generic; `BlogPosting` is appropriate for opinion and editorial; `NewsArticle` is appropriate for time-sensitive reporting with a journalist byline.

Code (json): { "@context": "https://schema.org", "@type": "Article", "@id": "https://10xsearch.com/blog/schema-markup-for-ai-search/#article", "headline": "Schema Markup for AI Search - The Technical Guide to Getting Cited", "description": "The schema types ChatGPT, Perplexity, Gemini, and Claude actually use to decide who gets cited - JSON-LD examples, parsing rules, and the eight mistakes we see every audit.", "image": { "@type": "ImageObject", "url": "https://10xsearch.com/images/blog/schema-markup-for-ai-search.png", "width": 1200, "height": 630 }, "datePublished": "2026-05-11T08:00:00-07:00", "dateModified": "2026-05-11T08:00:00-07:00", "author": { "@type": "Organization", "@id": "https://10xsearch.com/#organization", "name": "10xSearch Editorial" }, "publisher": { "@type": "Organization", "@id": "https://10xsearch.com/#organization" }, "mainEntityOfPage": { "@type": "WebPage", "@id": "https://10xsearch.com/blog/schema-markup-for-ai-search/" }, "keywords": "schema markup, JSON-LD, AI search, ChatGPT citation, Perplexity, Gemini", "articleSection": "Technical SEO", "wordCount": 3200 }

The fields AI engines actually consume in priority order: `headline`, `description`, `author`, `datePublished`, `dateModified`, `mainEntityOfPage`, `image`. Skip `wordCount` and `articleSection` if you want, but never skip the first six.

A common mistake: writers set `author` to a plain string ("Todd Koch") instead of a `Person` object. The Person object with `@id`, `url`, and `sameAs` is what makes the author a citable entity. Plain strings get discarded.

3. Person - The Author Layer

`Person` schema is where most B2B and real estate sites leak EEAT signal. Every author bio page and every author byline must reference a `Person` entity with verifiable identity signals.

Code (json): { "@context": "https://schema.org", "@type": "Person", "@id": "https://10xsearch.com/#todd-koch", "name": "Todd Koch", "jobTitle": "Founder", "worksFor": { "@type": "Organization", "@id": "https://10xsearch.com/#organization" }, "url": "https://10xsearch.com/about/", "image": "https://10xsearch.com/images/team/todd-koch.jpg", "description": "Founder of 10xSearch and AI Search Insider. 18 years in technical SEO and AI search visibility.", "sameAs": [ "https://www.linkedin.com/in/toddkoch", "https://x.com/toddkoch", "https://github.com/toddkoch" ], "knowsAbout": [ "Generative Engine Optimization", "Answer Engine Optimization", "Schema.org", "Retrieval-Augmented Generation", "Technical SEO" ], "alumniOf": { "@type": "EducationalOrganization", "name": "University Name" } }

`knowsAbout` is the lever almost nobody pulls. It is a controlled vocabulary of topics the person is authoritative on. AI engines use it to decide whether to surface this author when answering a query in that domain. A real estate agent's Person markup should include `knowsAbout: ["Luxury real estate", "Lake Lanier waterfront", "Forsyth County market", "Relocation"]` - not generic terms like "Real estate" alone.

4. FAQPage - The Direct-Answer Layer

`FAQPage` is the highest-conversion schema type for AI citation. The reason: AI engines optimize for direct-answer queries, and FAQ markup hands them a pre-parsed question-and-answer pair that maps cleanly to a citable response.

Code (json): { "@context": "https://schema.org", "@type": "FAQPage", "@id": "https://10xsearch.com/blog/schema-markup-for-ai-search/#faq", "mainEntity": [ { "@type": "Question", "name": "Does schema markup directly affect AI search rankings?", "acceptedAnswer": { "@type": "Answer", "text": "Schema markup does not change traditional Google organic ranking directly, but it materially increases the probability of citation in AI search surfaces - ChatGPT, Perplexity, Gemini, and Claude all preferentially cite pages with valid JSON-LD over otherwise identical pages without it. In our audits, the lift is 3 to 5x in citation frequency." } }, { "@type": "Question", "name": "Should I use JSON-LD or Microdata for AI search?", "acceptedAnswer": { "@type": "Answer", "text": "Always JSON-LD. Microdata and RDFa still validate against schema.org, but modern AI parsing pipelines (OpenAI, Perplexity, Anthropic) preferentially extract from JSON-LD blocks because they are structurally cleaner and live outside the page's visible DOM. JSON-LD is also easier to maintain in modern stacks (Next.js, Astro, WordPress with Yoast)." } } ] }

Two rules nobody follows: - The questions in FAQ markup must match the questions visible on the page. Google penalizes (and AI engines silently ignore) FAQ schema with hidden questions that do not exist in the rendered HTML. - One `FAQPage` per page maximum. Multiple FAQ blocks confuse parsers. Consolidate into a single object with multiple `mainEntity` Question entries.

5. HowTo - The Step Layer

`HowTo` schema marks up procedural content - step-by-step instructions, setup guides, tactical playbooks. It is the second-strongest direct-citation type because it maps to "how do I…" queries that AI engines field at high volume.

Code (json): { "@context": "https://schema.org", "@type": "HowTo", "name": "How to Add Organization Schema to a Next.js Site", "description": "Step-by-step deployment of Organization JSON-LD markup in a Next.js App Router site.", "totalTime": "PT15M", "step": [ { "@type": "HowToStep", "position": 1, "name": "Create the schema JSON file", "text": "Create /src/lib/schema/organization.ts exporting a typed JSON-LD object for your Organization." }, { "@type": "HowToStep", "position": 2, "name": "Inject into the root layout", "text": "Import the schema and render it inside app/layout.tsx using a <script type='application/ld+json'> tag with dangerouslySetInnerHTML." }, { "@type": "HowToStep", "position": 3, "name": "Validate with the Rich Results Test", "text": "Deploy to staging, then run the page through search.google.com/test/rich-results to confirm no warnings or errors." } ] }

`HowTo` is the schema type where Google rolled back rich result eligibility in 2023, but AI engines never followed Google's deprecation - Perplexity and ChatGPT still extract `HowTo` markup aggressively. Keep deploying it.

6. Product - The Commerce Layer

For SaaS, ecommerce, and any business that sells a discrete product or service tier, `Product` schema is the entity that ChatGPT and Perplexity cite when answering "what is the best X for Y" queries.

Code (json): { "@context": "https://schema.org", "@type": "Product", "name": "10xSearch AI Visibility Audit", "description": "Comprehensive AI search visibility audit covering ChatGPT, Perplexity, Gemini, and Claude. Includes mention rate baseline, competitor share-of-voice, schema coverage analysis, and 90-day remediation roadmap.", "brand": { "@type": "Organization", "@id": "https://10xsearch.com/#organization" }, "image": "https://10xsearch.com/images/services/visibility-audit.png", "offers": { "@type": "Offer", "price": "2500", "priceCurrency": "USD", "availability": "https://schema.org/InStock", "url": "https://10xsearch.com/services/visibility-audit/" }, "aggregateRating": { "@type": "AggregateRating", "ratingValue": "4.9", "reviewCount": "34" } }

If you publish `aggregateRating`, the underlying reviews must be public and verifiable on the same site, in matching `Review` markup. Fake or unverifiable ratings are the single fastest way to get your entire schema graph distrusted by an AI engine.

7. Review - The Social-Proof Layer

`Review` schema marks up individual customer reviews and is consumed by AI engines for sentiment, recommendation, and trust signals.

Code (json): { "@context": "https://schema.org", "@type": "Review", "itemReviewed": { "@type": "Product", "name": "10xSearch AI Visibility Audit" }, "author": { "@type": "Person", "name": "Ashley Inglis" }, "datePublished": "2026-03-15", "reviewBody": "10xSearch took us from 35% AI visibility to 49.2% in eight weeks. We now outrank Sotheby's and PureWest in our market.", "reviewRating": { "@type": "Rating", "ratingValue": "5", "bestRating": "5" } }

A `Review` without an `author.name` and a real `datePublished` will be ignored. Anonymity destroys the trust signal.

8. LocalBusiness (and its children) - The Geo Layer

For any business with a physical address - real estate, legal, medical, retail, professional services - `LocalBusiness` (or its specialized child like `RealEstateAgent`, `Dentist`, `Plumber`) is what AI engines consume when answering "best X near me" queries.

Code (json): { "@context": "https://schema.org", "@type": "RealEstateAgent", "@id": "https://dreamsmithrealty.com/#organization", "name": "Ashley Smith - Dreamsmith Realty", "image": "https://dreamsmithrealty.com/images/ashley-smith.jpg", "url": "https://dreamsmithrealty.com", "telephone": "+1-678-485-8858", "priceRange": "$$", "address": { "@type": "PostalAddress", "streetAddress": "5485 Bethelview Rd", "addressLocality": "Cumming", "addressRegion": "GA", "postalCode": "30040", "addressCountry": "US" }, "geo": { "@type": "GeoCoordinates", "latitude": 34.2073, "longitude": -84.1402 }, "openingHoursSpecification": [ { "@type": "OpeningHoursSpecification", "dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"], "opens": "09:00", "closes": "18:00" } ], "areaServed": [ { "@type": "City", "name": "Cumming, GA" }, { "@type": "City", "name": "Buford, GA" }, { "@type": "City", "name": "Forsyth County, GA" } ], "aggregateRating": { "@type": "AggregateRating", "ratingValue": "4.9", "reviewCount": "127" } }

The `areaServed` array is the geo lever almost nobody pulls. AI engines fielding "best real estate agent in Cumming GA" walk the `areaServed` array to determine match. Listing 4 to 10 specific cities and neighborhoods is the right density. Listing entire states or regions ("Georgia") is too broad and dilutes the signal.

How AI Engines Actually Parse Schema vs. HTML

Understanding the parsing pipeline matters because it explains why some markup gets used and other markup gets ignored.

When an AI retrieval system fetches your page, three things happen in sequence:

  1. HTML download and DOM parse. The page is fetched as a raw HTML document. Modern AI crawlers do execute JavaScript (OpenAI's browser tool, Perplexity's sonar, Anthropic's web search) but JavaScript-rendered content adds latency and is sometimes skipped under load.
  2. JSON-LD extraction. Every `<script type="application/ld+json">` block is parsed into a JSON object. This happens before semantic content extraction. The schema graph becomes the canonical entity index for the page.
  3. Content extraction and chunking. The visible HTML is then chunked, scored for relevance against the query, and ranked. Chunks that align with entities in the JSON-LD graph get prioritized for citation.

The order matters. Schema is read first. If your schema graph says the page is about `Article: "Schema Markup for AI Search"` authored by `Person: "10xSearch Editorial"` and the visible HTML matches, the model treats the page as a high-confidence source. If the schema says one thing and the HTML says another, both get downgraded.

Three operational consequences: - Always inject schema via static HTML or server-rendered output. JavaScript-injected JSON-LD (added client-side via `useEffect` or a third-party tag manager) is fetched intermittently. Server-render it. - Inline JSON-LD beats external schema references. Put the `<script>` block in the page HTML. Do not load schema from an external JSON file via `src=`. Most AI crawlers do not follow the secondary request. - One JSON-LD block per entity type per page. Multiple `Article` blocks on a single page confuse parsers. Consolidate into one block with `@graph` if you need to express multiple connected entities.

The `@graph` pattern is the cleanest way to express the full entity model of a page in one block:

Code (json): { "@context": "https://schema.org", "@graph": [ { "@type": "Organization", "@id": "https://10xsearch.com/#organization", "name": "10xSearch" }, { "@type": "Person", "@id": "https://10xsearch.com/#todd-koch", "name": "Todd Koch", "worksFor": { "@id": "https://10xsearch.com/#organization" } }, { "@type": "Article", "@id": "https://10xsearch.com/blog/schema-markup-for-ai-search/#article", "headline": "Schema Markup for AI Search - The Technical Guide to Getting Cited", "author": { "@id": "https://10xsearch.com/#todd-koch" }, "publisher": { "@id": "https://10xsearch.com/#organization" } }, { "@type": "WebPage", "@id": "https://10xsearch.com/blog/schema-markup-for-ai-search/", "isPartOf": { "@id": "https://10xsearch.com/#website" }, "primaryImageOfPage": { "@type": "ImageObject", "url": "https://10xsearch.com/images/blog/schema-markup-for-ai-search.png" } } ] }

A single `<script type="application/ld+json">` containing this `@graph` is parsed by AI engines as four connected entities. Cleaner than four separate script blocks and faster to parse.

Common Mistakes We See in Every Audit

These eight failure modes account for roughly 90% of the broken schema we find when auditing competitor and prospect sites.

1. Orphan schema. The page declares `Article` markup but no `Organization` or `Person`. The article has no provenance, no publisher, no author entity. AI engines can identify the topic but not the source - so they extract the content and cite a different page on a different domain that does have provenance markup.

2. Mismatched data between schema and HTML. The schema says the article was published on 2026-01-15. The visible byline says "Updated April 2026." The Organization name in schema is "10xSearch LLC"; the footer copyright says "10x Search, Inc." Models distrust the entire graph when any single field disagrees with what they extract from the HTML.

3. Schema in JavaScript-rendered content. A SPA renders the page via React, then injects JSON-LD into the DOM after hydration. The schema is technically present in the browser but never present in the crawler's first-pass fetch. Server-side rendering is non-negotiable for AI visibility.

4. Stale `dateModified`. The article was meaningfully updated last week but `dateModified` still reads from when the post was first published in 2023. AI engines preferentially cite recently-modified content for time-sensitive queries; an outdated `dateModified` makes you look stale even when you are not.

5. Generic `author` strings instead of Person entities. Setting `"author": "Editorial Team"` instead of `"author": { "@type": "Person", "@id": "...", "name": "Todd Koch", "url": "...", "sameAs": [...] }` is the single most common EEAT-killing error. Plain-string authors are discarded as untraceable.

6. FAQ schema with hidden questions. The FAQPage block declares six questions; the rendered page only shows three. Google penalizes this with manual action; AI engines silently distrust the entire schema graph. FAQ schema must match visible content one-to-one.

7. Multiple conflicting Organization entities. The site declares one `Organization` in the homepage schema, a slightly different `LocalBusiness` in the contact-page schema, and a third `Corporation` referenced from a service page. Models cannot decide which is canonical. Pick one `@id` and reference it from every page.

8. Schema with no `@id` at all. Without `@id`, every block on every page is a fresh, disconnected entity. The site has 200 unconnected `Article` objects and 200 unconnected `Organization` objects instead of one Organization referenced 200 times. The entity graph never consolidates and the brand never accumulates AI authority.

How to Audit Your Current Schema in 30 Minutes

Our standard schema audit, simplified for self-service:

  1. Crawl your top 20 pages and inspect for `<script type="application/ld+json">` blocks. A free crawler like Screaming Frog (under the 500-URL free limit) does this in two minutes. Note which pages have schema and which do not.
  1. Run each page through Google's Rich Results Test at [search.google.com/test/rich-results](https://search.google.com/test/rich-results). It validates schema.org syntax, flags warnings, and shows you the parsed entity graph. Warnings are not fatal but they correlate with downstream parsing problems.
  1. Run each page through the Schema.org Validator at [validator.schema.org](https://validator.schema.org). This is stricter than Google's tool and catches structural issues Google's tool ignores.
  1. Check entity consistency across pages. Use a JSON diff tool to compare the `Organization` block on your homepage, contact page, about page, and a representative blog post. They should reference identical `@id` values and identical name/url/logo fields. If they drift, consolidate.
  1. Confirm server-side rendering. Open your page in a browser, view source (not "inspect"), and search for `application/ld+json`. The schema must be in the raw HTML source - not just visible after JavaScript executes.
  1. Test an AI citation query. Ask ChatGPT, Perplexity, and Gemini a question your page is optimized to answer. If you appear in the citation list with your correct entity name, your schema is doing its job. If you appear with a corrupted name (or do not appear at all), schema is the first place to look.

The Deployment Pattern We Use for Clients

For new client onboarding we ship a standard schema pack: - One global `Organization` (or `LocalBusiness`/`RealEstateAgent` child) with stable `@id` - referenced from every page. - One `Person` entity per active author with `knowsAbout`, `sameAs`, `worksFor`, and a bio page URL. - `Article` schema on every blog post with full `headline`, `description`, `image`, `datePublished`, `dateModified`, `author` (referenced via `@id`), `publisher` (referenced via `@id`), `mainEntityOfPage`. - `FAQPage` on any content page with a Q&A section, matching visible content. - `HowTo` on any procedural guide, with `step` array matching the rendered numbered list. - `Product` and `Service` markup on commercial pages with verifiable `offers` and (where legitimate) `aggregateRating`. - `BreadcrumbList` on every page deeper than the homepage for entity-graph connectivity. - `WebSite` with `SearchAction` on the homepage for sitelinks search box and entity-graph rooting.

This pack covers 95% of the schema surface that AI engines consume. We deploy it server-side using a typed schema library (we use a Next.js + TypeScript stack with schema modules per entity type), validate every page in CI against Google's structured data testing tool API, and re-validate any time content changes.

Frequently Asked Questions

Does schema markup directly affect AI search rankings? Schema markup does not change traditional Google organic ranking directly. It materially increases the probability of citation in AI search surfaces. In our audits, pages with valid JSON-LD are cited 3 to 5x more often than otherwise identical pages without it.

Should I use JSON-LD or Microdata? Always JSON-LD. Microdata still validates but modern AI parsing pipelines preferentially extract from JSON-LD blocks. JSON-LD is also easier to maintain in modern stacks (Next.js, Astro, WordPress with Yoast/Rank Math).

Can I have too much schema on a page? Yes. Pages with five or more disconnected JSON-LD blocks confuse parsers. Consolidate into a single `@graph` block expressing the connected entities. Maximum one block per page if you can structure it cleanly.

Do I need schema if I already have an llms.txt file? Yes. They serve different layers. `llms.txt` is a site-wide curated brief; schema is page-level entity markup. The strongest AI visibility profiles combine both. For a deeper look at the `llms.txt` half, see our [llms.txt deployment guide](https://10xsearch.com/blog/what-is-llms-txt-file-and-why-every-business-needs-one/).

How often should I update schema? Whenever the underlying data changes. New blog post - new Article schema. New service line - new Service schema. New team member - new Person schema. NAP change - every LocalBusiness block. Stale schema is worse than missing schema.

Will schema markup eventually become irrelevant as AI engines get smarter? Unlikely on any horizon that matters. Even as language models improve at extracting entities from raw HTML, schema is the canonical, machine-validated layer. Removing the cost of inference is always strictly better than relying on inference. Every major AI lab has shipped retrieval pipelines that preferentially consume structured data.

What if my CMS makes schema hard to deploy? Most modern CMSs ship plugin support. WordPress has Yoast, Rank Math, and Schema Pro. Shopify and Wix have built-in schema. For headless and static stacks (Next.js, Astro, Hugo) you author schema as typed JSON in the codebase. If you are stuck on a legacy CMS that cannot inject custom JSON-LD into the `<head>`, that is itself an EEAT signal AI engines pick up - and a reason to migrate.

What to Do Next

If your site has zero schema, deploy the eight-type pack above starting with `Organization`, `Person`, and `Article`. Those three cover the foundational layer that every subsequent type stacks on.

If your site has partial schema, run the 30-minute audit and fix the eight failure modes in order. Orphan schema and mismatched data are the highest-use fixes; ship those first.

If your site has comprehensive schema, validate the graph quarterly, monitor your AI citation rate in Perplexity and ChatGPT, and re-deploy any time you ship significant content or organizational changes.

10xSearch builds and maintains schema graphs as part of every retainer engagement. We deploy the pack, validate it in CI, and audit it quarterly against the live AI citation surface. If you want yours audited, deployed, or refreshed, [start a conversation here](https://10xsearch.com/contact/).

The 10xSearch editorial team builds search-visibility infrastructure for high-stakes businesses. We publish playbooks based on the audits and engineered assets we ship every week.

About 10xSearch

We build the discoverability engine.

10xSearch.com engineers websites to be found and cited by Google, Google Maps, ChatGPT, Perplexity, Gemini, and Google AI Overviews. 40 engineered assets per month, every page graded against the 40-point Perfect Page Formula.