How AI shopping agents pick products: the role of structured data
A marketing strategist with a background in philosophy, Ara is the Coordinator for Marketing at WriteText.ai and 1902 Software. With seven years of experience in B2B marketing, she is deeply interested in AI and its impact on marketing and product development. As part of the core team at WriteText.ai, she helps bridge the gap between technology and strategy, making AI-powered solutions more accessible to businesses.
Short answer: An AI shopping agent picks products by reading structured data, not by looking at your website the way a person does. When a shopper asks for something, the agent pulls product records from feeds and on-page markup, scores each candidate on how well its data matches the request, and recommends the ones it can describe with confidence.
The deciding factor is rarely which product is objectively best. It is which product has data complete and accurate enough for the agent to match it to the query and trust it. If a key field is missing, the agent does not guess. It moves on to a competitor whose data answers the question.
This guide explains how that selection process actually works, what structured data is in this context, which fields carry the most weight, and what merchants can do to be the product an agent chooses. It pairs with our broader guide to agentic commerce and reflects the state of the market as of this writing.
Key takeaways
- AI shopping agents read structured product data (feeds and on-page markup), not page layouts, images, or marketing copy.
- Agents score products against a shopper's request and rank by query relevance, data completeness, and freshness.
- Empty or vague fields are disqualifiers. Agents generally do not infer missing attributes, so a product with gaps can drop out of the candidate set.
- Structured data takes two main forms: product feeds (for example, Google Merchant Center, and the new agentic protocols) and on-page Schema.org markup in JSON-LD. They need to agree with each other and with the page.
- The biggest lever you control is the quality, completeness, accuracy, and freshness of your product data.
How do AI shopping agents pick products?
A useful way to picture it is a short pipeline that runs every time a shopper makes a request.
-
Interpretation. The agent turns a natural-language request ("a lightweight carry-on under $100 that fits a 15-inch laptop") into structured criteria: category, price ceiling, attributes, constraints.
-
Retrieval. The agent gathers candidate products from the data it can access, which means structured feeds, on-page markup, and, increasingly, merchant catalogs exposed through agentic protocols.
-
Scoring. The agent evaluates each candidate against the criteria, in effect scoring how completely and clearly each product's data matches the request. A record that leaves a criterion unanswered scores lower or drops out.
-
Ranking and recommendation. The agent orders the candidates and surfaces the top matches, often with a short rationale, and in some cases moves toward checkout.
It follows that whether your product surfaces depends on three things the agent can assess from your data: how well that data matches the request, how complete it is, and how current it is. The first is shaped by the shopper's wording and by your descriptions. The other two are yours to control, and, as the rest of this guide shows, the platforms' own published requirements reward both completeness and freshness.
Why can't an agent just read my website?
Because an agent does not browse a page the way a human does. It does not appreciate your hero image, your photography, or your above-the-fold layout. A site can look flawless to a person and remain effectively invisible to an agent if the underlying data is thin.
Agents extract machine-readable data from listings and feeds and use language models to interpret the descriptions, then compare candidates against the shopper's stated requirements. The interpretation step is far more reliable when the answer sits in a structured field than when it has to be inferred from prose. Consider a query like "which of these is machine washable?" or "which is lighter?" Those are answered cleanly from structured attributes. Trying to extract them from a paragraph of description text is unreliable, so a product that only states such facts in prose, or not at all, is at a disadvantage in exactly the comparisons that decide a sale.
This is the core inversion of agentic commerce. In traditional ecommerce the page was the unit of competition. For agents, the data record is the unit of competition.
What counts as structured data here?
"Structured data" in this context means product information organized into defined, machine-readable fields rather than free-form text. It shows up in two main places, and merchants generally need both.
-
Product feeds. A feed is a structured file or API that lists your catalog with consistent attributes. The most familiar example is the Google Merchant Center product feed, which uses attributes such as title, description, link, image link, price, availability, brand, GTIN or MPN, and condition. The new agentic standards extend this idea: the Agentic Commerce Protocol and the Universal Commerce Protocol each define how a merchant exposes catalog data so agents can read and act on it.
-
On-page Schema.org markup. Schema.org is a shared vocabulary, originally created by Google, Microsoft, Yahoo, and Yandex, for describing things on web pages. For products, you add Product markup, and Google recommends doing it in JSON-LD, a block of code in the page's HTML that is separate from the visible content. It describes the same facts (name, price, availability, ratings, brand, and so on) in a format crawlers and agents can parse directly.
The two are related but not interchangeable. The feed is what you push to a platform. The on-page markup is what an agent or crawler reads from your live page. When they describe the same product, they need to agree, which brings us to trust.
How do agents decide which products to trust?
Completeness gets a product into the candidate set. Trust decides whether it stays there. A helpful framework is to think of three layers an agent evaluates:
- Structural completeness. Does the record contain enough machine-readable fields to establish what the product is, what it costs, and whether it is available?
- Semantic density. Is the description rich enough, in attributes and language, for the agent to match it to varied natural-language queries? A listing reduced to "Blue Backpack, $49.99" cannot compete with the same bag described by capacity, material, compartment dimensions, laptop fit, and use case.
- Trust signals. Are there reliable identifiers (GTIN or MPN), genuine ratings and reviews, accurate shipping and availability, and consistency across your sources?
That last point is concrete and enforceable. Google, for example, compares the price and availability in your structured data against your page and feed, and a mismatch can cost you eligibility. When you use automatic item updates, Google specifically relies on the Schema.org properties price, priceCurrency, availability, and condition to keep listings current. Agents inherit the same logic: data that contradicts itself is data they cannot trust, and untrusted products get dropped.
Freshness matters too, because an agent acting in real time needs current price and availability. Stale data is a real liability, and not only in theory. The most visible early agentic checkout, OpenAI's Instant Checkout in ChatGPT, was scaled back in March 2026 in part because it leaned on scraped data that produced inaccurate prices, availability, and shipping estimates at the moment of purchase.
Which product data fields matter most?
Not all fields carry equal weight, and adding fields has diminishing returns past a point. A practical way to prioritize:
-
Core fields (get these right first). Title, description, brand, GTIN, MPN, product category, price, sale price, availability, condition, image URL, and product URL. These map closely to the attributes in Google's product data specification, and missing any one of them reduces a product's chance of being surfaced. If a product genuinely has no GTIN, the manufacturer part number (MPN) is the standard fallback identifier.
-
Enrichment fields (where competitive lift comes from). Material, dimensions, weight, color, size and variant data, compatibility, use cases, and relevant certifications. These answer the specific, comparative questions agents field, and they are usually where thin catalogs fall short.
-
Diminishing returns. Past a point, accuracy and consistency matter more than sheer quantity. Stuffing in low-quality or inconsistent fields does not help and can hurt trust.
-
Ratings, reviews, and images. Aggregate ratings and reviews are trust signals agents weigh, and images feed the recognition step. For Schema.org product snippets specifically, Google requires the
nameproperty plus at least one ofreview,aggregateRating, oroffers.
There is no official completeness threshold to hit, so the practical target is straightforward: fill every field an agent uses, on every product. Because an agent can exclude a product when a field it needs is blank, the payoff favors closing gaps across the whole catalog rather than over-polishing a handful of listings.
How does on-page Schema.org markup help, specifically?
On-page markup is how an agent or crawler reads the current truth from your live page, so a few technical points matter.
- Use JSON-LD. Google recommends it because it lives in one block in the HTML, separate from visible content, and is easier to maintain than inline microdata.
- Render it server-side. Google notes that structured data must be present in the HTML the server returns, and that markup generated by JavaScript after the page loads makes shopping crawls less frequent and less reliable, which is a problem for fast-changing fields like price and availability.
- Mark up one product per page. Product markup belongs on individual product pages, not category pages. Use ProductGroup for variants of a single product, not as a wrapper for a collection of different products.
- Keep it synchronized. The values in your JSON-LD must match what the page and the feed show. A schema price of $49.99 against a displayed $39.99 is the kind of mismatch that breaks trust and eligibility.
- Validate. Use Google's Rich Results Test and the Schema Markup Validator to confirm your markup parses cleanly before you rely on it.
Is this just SEO with a new name?
No, though the two reinforce each other. Traditional SEO optimizes pages so humans can find and click them. Structuring data for agents optimizes records so machines can match, trust, and recommend them. When an agent is surfacing shopping results, it does not need editorial content so much as structured data: it evaluates feeds and executes against APIs.
The useful part is that the work compounds. Complete, valid Product schema also improves your eligibility for rich results in conventional search, so investment in structured data pays off across both the agent channel and the search channel. The mistake is assuming that a well-ranked blog post or a beautiful product page substitutes for a complete, accurate, current data record. In the agent channel, it does not.
Why do agents skip products? Common failure modes
If your products are not getting surfaced, the cause is usually one of these:
- Empty core fields. A missing GTIN, price, or availability value removes the product from consideration.
- Vague titles and thin descriptions. Generic titles give the agent nothing to match against specific queries.
- Facts buried in prose. Attributes stated only in a description paragraph, rather than in structured fields, are unreliable for the agent to extract and compare.
- Mismatched data. Price or availability in the feed or schema that contradicts the page.
- Stale feeds. Once-a-day updates that leave the agent showing out-of-stock or mispriced items, which erodes trust and breaks the buying experience.
- JavaScript-only markup. Structured data that only appears after client-side rendering, which crawlers may not reliably capture.
How do I make my catalog AI-readable?
The work is concrete and mostly about data discipline.
-
Audit completeness. Measure fill rates across the catalog and find the gaps in core and enrichment fields. Aim high on the fields agents actually use.
-
Enrich the records. Fill in the descriptive attributes (material, dimensions, compatibility, use cases) that answer comparative questions, and write descriptions that are both accurate for humans and dense enough in attributes for machines.
-
Add and validate Schema.org markup. Implement Product markup in JSON-LD, server-rendered, one product per page, and validate it.
-
Keep feeds clean and current. No disapproved products, no policy issues, and updates frequent enough that price and availability are right at the moment an agent reads them.
-
Keep your sources consistent. Make sure the page, the on-page markup, and the feed all tell the same story.
-
Measure what gets surfaced. Because purchases and recommendations can happen off your site, build a way to see which products and which content are actually being picked up and converted, so you can keep improving the records that matter rather than guessing.
What this means for your product content
The throughline is simple. For an AI shopping agent, your product data is the product. The agent never sees your design, only your title, attributes, description, price, availability, ratings, and markup, and it decides in a fraction of a second whether your record matches the shopper's intent and whether it can trust what it reads.
That makes product content a piece of commercial infrastructure rather than a marketing afterthought. The catalogs that win in the agent channel are the ones that are complete, richly attributed, internally consistent, kept current, and measured against what agents actually surface. This is the shift that platforms like WriteText.ai are built around: generating and enriching ecommerce product content at catalog scale, with native integrations for WooCommerce, Magento and Adobe Commerce, and Shopify, so that the data agents depend on is accurate, consistent, and optimized. As more discovery moves into AI assistants, the merchants who treat their product data as a strategic asset are the ones agents will pick.
FAQs
How do AI shopping agents choose which products to recommend?
They read structured product data, score each candidate on how well its data matches the shopper's request, and rank by relevance, completeness, and freshness. Products with incomplete or inconsistent data are scored lower or excluded.
What is structured data for ecommerce?
It is product information organized into defined, machine-readable fields rather than free-form text. It lives in product feeds (such as Google Merchant Center) and in on-page Schema.org markup, usually written in JSON-LD.
Do AI agents read my product descriptions or my structured data?
Both, but they rely on structured fields for the facts that decide comparisons. Attributes stated only in description prose are harder for an agent to extract reliably, so key facts should also live in structured fields.
Which product attributes matter most for AI visibility?
Start with core fields: title, description, brand, GTIN or MPN, category, price, availability, condition, and image and product URLs. Then add enrichment attributes like material, dimensions, compatibility, and use cases, which answer the specific questions agents field.
Does Schema.org markup help with AI shopping agents?
Yes. On-page Product markup in JSON-LD lets agents and crawlers read current, structured facts directly from your page. It should be server-rendered, validated, and consistent with your feed and visible page.
Why are my products not showing up in AI shopping results?
The most common reasons are missing core fields, vague titles, facts buried in prose, data that contradicts itself across feed and page, and stale feeds that leave prices or availability wrong.
Is structuring product data for agents the same as SEO?
They overlap and reinforce each other, but they are not the same. SEO optimizes pages for human discovery and clicks. Structured data optimizes records so machines can match, trust, and recommend them. Complete Product schema helps both.