Shopify robots txt guide: what to allow, what to block, and why AI bots matter

Shopify robots.txt file is the first thing that every web crawler will read before crawling over your store. Google, Bing, Ahrefs, GPTBot, ClaudeBot, PerplexityBot and many more will all “knock” on your site for indexing and your robots.txt file will tell them whether they are let in or not. Make a small mistake with this file and you could be wasting your store’s crawl budget on filter pages, or unknowingly be excluded from the new AI search results that are recently generating significant traffic.
This guide covers what Shopify includes in the default Header.Тhis guide covers a couple of areas, what Shopify ships by default, what you actually need to change, how to customize the file through your robots.txt.liquid file, and a rant in 2026 about why blocking AI crawlers is the equivalent of dropping your phone and wondering why nobody rings you.
In this post
- What robots.txt does
- Shopify default rules
- How to customize
- What to allow vs block
- AI bots and why they matter
- Testing and auditing
- Common mistakes
- FAQ
What robots.txt actually does
The robots.txt file is a plain text file at the root of your site (http://yourstore.com/ robots.txt). Its purpose is to instruct web crawlers which directories and files on a website that the crawlers can or cannot download. It does NOT hide content from search engines. It does NOT control what gets indexed. It only tells the crawlers what to web scrape in the first place. A very different function from common misunderstanding and one that is critical to search engine crawling.
If you want to remove a page from Google’s index, you can use the noindex meta tag or the URL removal tool in Search Console. robots.txt on the other hand will only tell the Google crawler not to visit a page. However, a page that is blocked in this way can still show up in search results if other sites include a link to it.
What Shopify ships by default
By default, every Shopify store has a pre-filled robots.txt that prevents crawlers from indexing checkout, cart, admin, policies, search results, gift card pages and URL parameters that cause faceted duplication. However, product and collection pages are still allowed. Additionally, the default Shopify robots.txt points to your sitemap. For 90% of Shopify stores, this default setting will be optimal.
Check yours now. Open yourstore.com/robots.txt in a browser. See the defaults. Then decide whether you need to change anything. Most stores don’t need to change any of the defaults. Some stores absolutely do.
How to customize the file on Shopify
In Shopify you can now edit the robots.txt as of 2021 directly through your store theme. Shopify introduced the ability to edit the robots.txt within a theme template file called: robots.txt.liquid. Here is how to do it.
- Online Store, Themes, Edit code.
- Under Templates, click Add a new template.
- Pick robots.txt from the dropdown.
- Edit using Liquid rules to add, remove, or replace directives.
- Save. Changes propagate within a few minutes.
*_DO_NOT_SET_THIS_WHOLE_TEMPLATE_MANUALLY_*I found out that you should use Liquid to add functionality to the default template that shopify provides. This default is updated by shopify and if you manually set the template you can leak pages that you should block while at the same time not getting updates that shopify does to that default.
Not comfortable editing Liquid? There are free robots.txt generators available on the web that can help you generate the necessary directives first, and then you can copy and paste the output into the Liquid template. These generators understand Shopify-specific filepaths like cart, checkout, policies and search, and will not produce the same kind of broken patterns that can break crawling that using a general-purpose robots.txt generator might cause.
What to allow, what to block
| Path | Rule | Why |
|---|---|---|
| /products/* | Allow | Revenue pages, always crawlable |
| /collections/* | Allow | Category pages rank |
| /pages/* | Allow | About, contact, content pages |
| /blogs/* | Allow | Content marketing |
| /cart | Block | No SEO value |
| /checkout | Block | Private |
| /search | Block | Infinite URL variants |
| /*?sort_by=* | Block | Duplicate content |
| /*?filter=* | Block | Faceted duplication |
AI bots and why they matter now
Instead of telling people to block AI bots in 2023 so that your content isn’t scraped for future model training, that advice is now passé and actually costing stores money in 2026.
ChatGPT,Claude,Perplexity,Sidekick,Gemini now answer shopping-related questions. When users like you ask questions like “what are the best combined listings apps for Shopify?” you get a straightforward answer with a citation to drive some clicks. Some of those clicks are going to shops like this retail electronics store, and those answers were written by some variation of a bot like GPTBot, ClaudeBot, or PerplexityBot, or Google-Extended.
If you block them in robots.txt, your store is invisible to them. You are not in the training data. You are not in the live retrieval index. You will not be cited. Meanwhile your competitor who left them allowed is getting quoted in every ChatGPT answer in your category. That is the GEO gap. It is widening fast.
Use this free AI bot checker to see what AI crawlers your robots.txt does and does not allow. GPTBot, ClaudeBot, PerplexityBot should not be blocked. More on AEO and AI readiness here.
Strong opinion: Stupid, stupid, stupid. By 2026 we’ll all have AI bots banned from commenting on sites like this because nobody is smart enough to see that you’re not loosing (losing) any sales by allowing and LLM to read your product page to generate an answer. Because we’re geniuses, we recognize that this LLM activity will only serve to add value to your site by providing additional answers that give prospects free distribution for your sales pitch. Seize the ball and run with it.
Testing and auditing
Three checks to run monthly:
- Google Search Console robots.txt tester. Catches syntax errors.
- The AI bot checker. Confirms which LLM crawlers are allowed.
- Server log grep for user-agent. See which bots actually showed up.
Common mistakes
The most classic mistake of disasters I see is to not remove the Disallow: / directive left in a staging theme, which in a matter of weeks can cause an entire store to deindex and only come to your attention at 11pm the day it happens. Check your production robots.txt the moment you push a new theme.
Other frequent problems: blocking /collections/ because someone confused it with /search. Blocking CSS and JS paths that Google needs to render the page. Forgetting the sitemap line.
Related tools on this site
- Robots.txt generator: draft Shopify-safe directives
- AI bot checker: see which crawlers are allowed
- All free Shopify tools
See the live demo store, watch the tutorial video, or read the getting started guide.
FAQ
Can I edit robots.txt on Shopify?
A robots.txt template was added in 2021, and you can customize the directives by adding a robots.txt.liquid template under your theme code. Defaults are provided, but should be sensible.
Does Shopify block AI bots by default?
It allows all standard user agents, including AI crawlers. I think you would have to have manually changed this to block Shopify data gatherers. The default settings include this: User-agent: robots # Added by robot – http://www.robotstxt.org/docs/generic/
Should I block GPTBot and ClaudeBot?
No. Allow them. AI search engines now drive real traffic through citations. Blocking them will remove your store from ChatGPT, Claude, and Perplexity answers in your category.
What happens if I accidentally block everything?
Google will start deindexing your store within days. Organic traffic will collapse while you sit idle. Fix the file ASAP and get your store recrawled in Search Console.
Should I block faceted filter URLs?
Yes. There are some url paths that generate infinite duplicates with sort_by, filter, etc parameters. This should be blocked in order to save some crawl budget and not screw up duplicate content signals.
Does robots.txt remove pages from Google?
This will not address the indexing issue. To keep a page from being indexed, you should use a noindex meta tag.
How do I check which AI bots are crawling my store?
Use the AI bot checker tool on your site to see which known LLM crawlers are currently allowed or blocked.