Stop blocking AI bots in your Shopify robots.txt (2026 update)

Open any robots.txt guide written between 2023 and early 2025 and the advice is the same: “Block GPTBot. Block ClaudeBot. Block all the AI scrapers.” That advice was reasonable when AI bots fetched your content to train models you got nothing from. It is bad advice in 2026. AI search and agentic commerce now route real shoppers and real revenue to merchants who let the bots through. Blocking them today is roughly equivalent to blocking Googlebot in 2010. You disappear from the channel that is becoming the next acquisition surface.
This post is the corrected, ecommerce-specific version. We list every AI user-agent that matters in 2026, what each one does, which ones a Shopify store should allow, and the small set you should still keep out. We also walk through how to fix your robots.txt without breaking your store, including the Shopify-specific quirks (Shopify does not let you edit robots.txt directly the way WordPress does, so the fix is one screen deeper than most guides assume).
The merchants we talk to who blocked AI bots back in 2024 are now scrambling to undo the block as ChatGPT, Perplexity, and Google AI Mode start surfacing real referral traffic. The pattern is consistent enough that it is worth getting ahead of: open up the bots that route shoppers to you, keep cart and admin paths sealed, and stop leaving AI search visibility on the floor.
In this post
- Why blocking AI bots was popular advice (and why it stopped making sense)
- The three types of AI bots, and why the difference matters
- The full 2026 AI user-agent reference (with what each one does)
- What to allow on a Shopify store
- What to still block (and why)
- How to actually edit Shopify’s robots.txt
- A working Shopify robots.txt for AI search visibility
- How to test your changes
- Common mistakes that quietly leak AI traffic
- Frequently asked questions
- Related reading
Why blocking AI bots was popular advice (and why it stopped making sense)
The “block all AI bots” line came from a real concern. In 2023 and 2024, GPTBot was scraping content to train ChatGPT, and content sites whose pages got summarized lost click-through. Publishers (especially media companies) responded by blocking. The advice spread to ecommerce stores by copy-paste, even though the situation was already different there.
Three things changed by 2026:
- AI search became a referral channel. ChatGPT, Perplexity, and Google AI Mode now drive shoppers to merchant sites. Blocked stores are invisible. Allowed stores get cited and clicked.
- Agentic commerce went live. Shopify’s Agentic Storefronts pushes your products into ChatGPT, Copilot, Perplexity, and Google AI Mode by default. Blocking the bots that read your storefront undermines the platform integration that is now driving sales.
- The bots split into specialized roles. Most AI vendors now run separate bots for training versus search-indexing versus on-demand user fetches. You can allow the search and user-fetch bots while blocking the training bot if you want. The old “all or nothing” framing was always wrong; it is now obviously wrong.
Translation: blocking GPTBot in 2026 does not just stop training. It also weakens your visibility in the assistants where shoppers are searching. The block is no longer free.
The three types of AI bots, and why the difference matters
Every AI vendor’s bot landscape now sorts into three buckets. Knowing which bucket each bot is in is the difference between making a smart blocking decision and accidentally cutting off a referral source.
- Training bots. Crawl the web to feed model training datasets. Examples: GPTBot, ClaudeBot. Blocking these stops your content from being used to train the next model. There is a reasonable case to block these for some stores. Most ecommerce content has little training value anyway, so the block is symbolic.
- Search-index bots. Build the AI’s real-time search index, the one used when a shopper asks “find me a black hoodie under $80.” Examples: OAI-SearchBot, Claude-SearchBot, PerplexityBot, Bingbot (which feeds Copilot). Blocking these makes your store invisible in AI search. Do not block these.
- User-fetch bots. Triggered when a specific shopper sends a query to the AI and the AI fetches your page in real time. Examples: ChatGPT-User, Claude-User, Perplexity-User, Google-Agent. Blocking these breaks the most commercial use case (an actual shopper trying to access your store through an AI assistant). Do not block these.
A reasonable middle position: allow search-index and user-fetch bots, optionally block training bots. We allow all three categories on our stores because we want to be cited in answers and we believe Shopify product pages have negligible training utility anyway. Your call on training; the other two should be open.
The full 2026 AI user-agent reference (with what each one does)
Verified user-agent strings as of mid-2026. If a bot is missing here, it either (a) does not exist by that name, or (b) is undocumented (which is its own warning sign).
OpenAI
| Bot | Type | Purpose |
|---|---|---|
GPTBot | Training | Crawls for ChatGPT model training data |
OAI-SearchBot | Search index | Builds ChatGPT search index |
ChatGPT-User | User fetch | Real-time fetch when a user asks ChatGPT to read a URL |
Anthropic (Claude)
| Bot | Type | Purpose |
|---|---|---|
ClaudeBot | Training | Crawls for Claude model training data |
Claude-SearchBot | Search index | Builds Claude retrieval index |
Claude-User | User fetch | Real-time fetch when a Claude user requests a URL |
Perplexity
| Bot | Type | Purpose |
|---|---|---|
PerplexityBot | Search index | Builds Perplexity answer index. Notably, Perplexity does not train on crawled content. |
Perplexity-User | User fetch | Real-time query retrieval |
| Token | Type | Purpose |
|---|---|---|
Googlebot | Search index | Standard Google search and Google AI Mode (the bot has not changed; the use has expanded) |
Google-Extended | Training control | This is a robots.txt token, not a separate crawler. Disallowing it tells Google not to use your content to train Gemini. |
Google-CloudVertexBot | Enterprise retrieval | Vertex AI enterprise customers querying for grounded retrieval |
Microsoft (Copilot)
| Bot | Type | Purpose |
|---|---|---|
Bingbot | Search index | Powers Bing Search and Microsoft Copilot. There is no separate Copilot bot. |
Other notable bots
| Bot | Company | Type |
|---|---|---|
Applebot | Apple | Search index, feeds Apple Intelligence |
Applebot-Extended | Apple | Training control token (like Google-Extended) |
Meta-ExternalAgent | Meta | Training and AI indexing |
Amazonbot | Amazon | Training and Alexa/Nova indexing |
CCBot | Common Crawl | Open web corpus, used by many third-party trainers |
DuckAssistBot | DuckDuckGo | AI-assisted answers |
MistralAI-User | Mistral | User-triggered retrieval |
Bots to be cautious about
- Bytespider (ByteDance): undocumented behavior, repeated reports of spoofing and ignoring robots.txt
- xAI Grok crawler: undocumented, uses residential IPs
- Cohere training crawler: no published vendor reference
These get a different treatment, covered below.
What to allow on a Shopify store
Default recommendation for any Shopify store that wants to be visible in 2026 AI search:
- Allow: GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Bingbot, Applebot, MistralAI-User, DuckAssistBot, Googlebot (already allowed by default).
- Allow Google-Extended and Applebot-Extended (these are training-consent tokens, allowing them lets Google and Apple use your content to train Gemini and Apple Intelligence; for most ecommerce stores there is no real downside).
Shopify’s default robots.txt already allows most of these. The mistake we see most often is that someone copy-pasted a 2023-era “block all AI” snippet into the custom robots.txt.liquid template and never revisited it.
What to still block (and why)
The list of bots and paths worth blocking is short:
- Bytespider, xAI Grok crawler, Cohere training crawler. No documentation, ignore robots.txt, and provide no value back. We block all three.
- Optionally GPTBot, ClaudeBot, Meta-ExternalAgent, Amazonbot training bots if you have a strong principled stance against AI training on your content. This is a values call, not a traffic call. Search-index and user-fetch bots from the same vendors are separate; you can keep those allowed even if you block training bots.
- The cart, checkout, account, and admin paths for everyone. Already blocked by default in Shopify’s robots.txt. Do not undo this. AI agents do not need to crawl your checkout.
One thing to know: undocumented bots that ignore robots.txt do not actually get blocked by writing a Disallow rule. The block is a polite request that compliant crawlers respect and others ignore. To actually stop bad actors, you need server-side blocking (Cloudflare, Shopify’s edge filtering, or rate limits). Shopify’s edge protection catches a lot of this automatically.
How to actually edit Shopify’s robots.txt
Here is the part most generic robots.txt guides skip. Shopify does not let you upload a custom robots.txt file the way WordPress does. Instead, you customize through a Liquid template. Steps:
- In your Shopify admin, go to Online Store > Themes. On your live theme, click the three-dot menu and pick Edit code.
- Under Templates, click Add a new template. Choose robots, file type liquid. Shopify creates
robots.txt.liquid. - The default template renders Shopify’s standard robots.txt rules with a Liquid placeholder. To add custom rules, edit the template carefully. The recommended pattern is to keep the default rules intact and append your custom rules at the bottom.
- To remove specific user-agent blocks (if you previously added them), find and delete the relevant
User-agent: GPTBot+Disallow: /blocks. - Save. The change is live in seconds.
If you have never customized robots.txt, no template exists yet, and Shopify is serving the default which already allows the major AI bots. You probably do not need to do anything. The audit step below will confirm.
Our Shopify robots.txt guide covers the template syntax in more depth, including how to keep Shopify’s defaults while layering in custom rules.
A working Shopify robots.txt for AI search visibility
Append this block at the bottom of your custom robots.txt.liquid template if you want explicit allow rules for the major AI bots. Most stores do not need to add anything; this is for the cases where you previously blocked AI bots and need to undo it explicitly.
# AI search and answer engines (allowed) User-agent: GPTBot Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: ClaudeBot Allow: / User-agent: Claude-SearchBot Allow: / User-agent: Claude-User Allow: / User-agent: PerplexityBot Allow: / User-agent: Perplexity-User Allow: / User-agent: Applebot Allow: / User-agent: Applebot-Extended Allow: / User-agent: Google-Extended Allow: / # Undocumented or non-compliant scrapers (blocked) User-agent: Bytespider Disallow: / User-agent: GrokBot Disallow: / User-agent: cohere-ai Disallow: /
Notes:
Allow: /is technically not required where the default allows everything. We include it for explicit clarity, in case Shopify or a future plugin tightens the default.- Bot names are case-insensitive in robots.txt parsing.
GPTBotandgptbotmean the same thing. - If you want to allow training bots but block training of your content, leave GPTBot/ClaudeBot allowed and add
Google-Extended Disallow: /instead. Training control tokens are the surgical version of “no training, yes search.”
How to test your changes
- Fetch your robots.txt directly. Open
https://yourdomain.com/robots.txtin incognito and read the rendered file. Confirm there is noUser-agent: GPTBotfollowed byDisallow: /. - Run the AI Bot Checker. Our AI Bot Checker reads your robots.txt and tells you exactly which AI user-agents are allowed or disallowed. Faster than reading raw text.
- Test with ChatGPT. Ask ChatGPT (browse mode) “Visit https://yourdomain.com/about and summarize the brand.” If it can fetch and respond, the path is open.
- Server logs (advanced). If you have access to logs, search for user-agent strings containing GPTBot, ClaudeBot, PerplexityBot. You should see hits within 24 to 48 hours of your robots.txt going live (or earlier if your store already has crawl history).
Common mistakes that quietly leak AI traffic
- Blocking
User-agent: *globally except for Googlebot. The wildcard hits every AI bot too. Tighten your global rule or add explicit allows for the AI user-agents. - Disallowing
/products/or/collections/. We have seen this on stores that wanted to “consolidate to canonical URLs.” It blocks every bot from the actual catalog. Undo it. - Using
noindexmeta tags on category pages. Different from robots.txt but similar effect. AI bots respect noindex. If you noindex collection pages, they will not appear in AI search either. - Blocking
/blog/. AI agents weight content pages heavily for brand context. A blocked blog reduces the signal that informs how the agent describes your brand. - Forgetting to update after migrating themes. Theme migrations sometimes regenerate robots.txt.liquid from old templates. Re-check after any theme change.
- Blocking only ClaudeBot but not Claude-SearchBot or Claude-User. Anthropic split into three bots in early 2026. Sites that only block “ClaudeBot” by name miss the search and user-fetch bots, which is fine if your goal was to allow search but the labeling is confusing. Make sure your block list reflects what you actually meant to do.
Frequently asked questions
Should I block GPTBot on my Shopify store in 2026?
For most ecommerce stores, no. GPTBot crawls for ChatGPT model training. Blocking it stops training but does not affect ChatGPT search visibility (that is OAI-SearchBot) or user-initiated fetches (ChatGPT-User). The training value of typical product page content is low, and the symbolic block has no commercial upside. We recommend allowing all three OpenAI bots unless you have a specific reason to block training.
Will allowing AI bots hurt my Google rankings?
No. AI bots and Googlebot are separate crawlers and do not interact. Allowing GPTBot, ClaudeBot, or PerplexityBot has no effect on your standard Google search ranking. Some stores worry that AI bots will hammer their server. In our logs, AI bot traffic is a fraction of Googlebot traffic for a typical Shopify store.
What is the difference between Google-Extended and Googlebot?
Googlebot is the search crawler. Google-Extended is a robots.txt control token (not a separate bot) that tells Google whether to use your content to train the Gemini AI models. You can allow Googlebot for search ranking while disallowing Google-Extended to opt out of Gemini training. The token does not affect search ranking or AI Overview visibility.
Does Shopify allow AI bots by default?
Yes. Shopify’s default robots.txt does not block GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Claude-SearchBot, or any other major AI search bot. The default disallows are limited to admin, cart, checkout, search results, and similar non-content paths. If your store blocks AI bots, someone added the rules manually through robots.txt.liquid.
Will AI bots scrape my checkout or admin?
No. Shopify’s default robots.txt disallows /admin/, /checkout/, /cart/, /account/, /search?, and other non-content paths for all bots. AI bots respect these rules. Compliance is high among the major vendors (OpenAI, Anthropic, Perplexity, Google, Microsoft). Undocumented scrapers ignore robots.txt entirely, but they do not reach checkout because the URLs are protected at the application level too.
If I previously blocked AI bots and now allow them, how long until I show up in AI search?
Re-indexing typically takes 1 to 4 weeks for the major AI search engines. ChatGPT search and Perplexity update faster than Google AI Mode. You will see referral traffic appear gradually as the indexes refresh. Submit your sitemap to OpenAI’s submit tool if you want to nudge it along.
Can I block training bots but allow search and user-fetch bots?
Yes. Disallow GPTBot, ClaudeBot, and Meta-ExternalAgent for training. Allow OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, and Perplexity-User for search and user-fetch. This is the principled middle ground for stores that have a stance on training but still want AI search visibility.
Related reading
- Shopify robots.txt guide (template syntax and Shopify-specific quirks)
- llms.txt for Shopify: complete setup guide
- Shopify AI readiness: prepare for ChatGPT and AI search
- Shopify Agentic Storefronts
- How variant grouping affects AI shopping discovery
- Rubik Variant Images for AI-readable variant images
- Rubik Combined Listings for unified product entities AI agents prefer
One last thought: the cost of running an audit is 10 minutes with our AI Bot Checker tool. The cost of an undetected block is months of missed referrals from a channel that is doubling every quarter. Run the audit. Fix what you find.