Generate a properly formatted robots.txt file for your Shopify store. Select which crawlers to allow or block, add your sitemap, and block AI training bots with a few clicks. Copy the finished file and paste it into your Shopify admin.
Shopify automatically generates a robots.txt file for every store, but since Shopify 2.0 you can customize it using a robots.txt.liquid template. The default file is reasonable but does not block AI crawlers, and you may want finer control over which paths search engines can access. An incorrectly configured robots.txt can either expose pages you want hidden or accidentally block pages you need indexed.
Use this tool to build your robots.txt rules visually without memorizing the syntax. The generator includes Shopify-specific defaults that protect your admin, cart, and checkout pages while keeping your product and collection pages fully crawlable. Preview the output in real time, then copy it into your Shopify theme’s robots.txt.liquid file.
According to a 2024 Cloudflare Radar report, AI-related bot traffic now accounts for roughly 38% of all automated web requests, up from under 10% in 2022. For Shopify merchants who invest time creating unique product descriptions, blog content, and buying guides, uncontrolled AI scraping means your content can be ingested into training datasets and effectively reproduced by competitors using generative AI tools. A properly configured robots.txt is your first line of defense against this content harvesting while keeping your store fully visible to paying customers through search engines.
The Robots Exclusion Protocol has been the standard for controlling web crawler access since 1994. Google, Bing, and all major search engines respect it. In 2022, Google formally adopted the RFC 9309 standard for robots.txt parsing, giving merchants a reliable, well-documented framework for managing crawler behavior. For Shopify stores, this means your robots.txt rules are honored consistently across the search engines that drive the majority of organic traffic.
Robots.txt at a Glance
| Detail | Value |
|---|---|
| Protocol standard | Robots Exclusion Protocol (RFC 9309) |
| Shopify default location | yourdomain.com/robots.txt |
| Customization method | robots.txt.liquid template (Shopify 2.0+) |
| Supported by | Google, Bing, Yandex, Baidu, DuckDuckGo, and all major crawlers |
| AI bots that honor it | GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, PerplexityBot |
| Time to take effect | Immediately upon crawler’s next visit (typically within 24 hours) |
| File size limit | 500 KB (Google’s maximum parsed size) |
| Enforcement level | Advisory (well-behaved bots comply; malicious scrapers may ignore) |
How This Tool Works
This generator builds a standard robots.txt file based on your selections. It creates User-agent directives for each bot category and adds Disallow rules for the paths you want to block. The output follows the Robots Exclusion Protocol, which all major search engines and well-behaved crawlers respect.
For Shopify stores, the generated file is designed to be placed in your theme’s robots.txt.liquid template (found under Templates in the Shopify code editor). Shopify serves this file at yourdomain.com/robots.txt. The Shopify-specific defaults block admin, cart, checkout, search, and account pages, which are the same paths Shopify blocks by default.
AI bot blocking works by creating specific User-agent rules for each AI crawler. When GPTBot, ClaudeBot, or other AI training bots encounter a Disallow directive, they are instructed not to crawl your content for training purposes. Note that blocking AI bots does not affect your search engine rankings, as these are separate crawlers from Googlebot and Bingbot.
Step-by-Step Guide: Setting Up robots.txt on Shopify
Configuring a custom robots.txt on Shopify requires a few specific steps that differ from traditional web hosting. Follow this guide to get your file live and verified.
- Generate your robots.txt using this tool. Select the crawlers you want to allow, check the Shopify paths to block, enable AI bot blocking if desired, and paste your sitemap URL. Click “Generate robots.txt” and review the output.
- Copy the generated output. Click the “Copy to Clipboard” button to copy the entire robots.txt content. Double-check that your sitemap URL is correct and all desired rules are included.
- Open the Shopify theme editor. In your Shopify admin, navigate to Online Store, then Themes. Click the three-dot menu on your active theme and select “Edit code.” This opens the theme code editor.
- Create or edit robots.txt.liquid. In the Templates section, look for robots.txt.liquid. If it does not exist, click “Add a new template” and select “robots.txt” from the template type dropdown. This creates the robots.txt.liquid file.
- Paste your generated content. Replace the entire contents of robots.txt.liquid with the output from this tool. Save the file.
- Verify the file is live. Visit yourdomain.com/robots.txt in your browser. You should see your custom rules. The change takes effect immediately.
- Validate in Google Search Console. Go to Google Search Console, navigate to Settings, then Crawl Stats, then check the robots.txt report. Use the robots.txt Tester to confirm that important pages are not accidentally blocked.
- Monitor crawl behavior. Over the following weeks, check Google Search Console’s Coverage report to confirm that your desired pages remain indexed and blocked pages are excluded as expected.
Why This Matters for Your Shopify Store
A well-configured robots.txt prevents search engines from wasting their crawl budget on pages that should not be indexed, like your cart, checkout, and internal search results. When Googlebot spends less time on irrelevant pages, it can discover and index your product and collection pages faster, which directly improves your organic visibility.
Blocking AI training bots is increasingly important for content-heavy Shopify stores. If you invest in writing detailed product descriptions, blog posts, or buying guides, AI crawlers can scrape that content and use it to train models that may then generate similar content for your competitors. Blocking these bots protects your intellectual property while keeping your store fully accessible to customers and search engines.
Crawl budget optimization is not just theoretical. Google has confirmed that for large sites (over 10,000 pages), crawl budget becomes a real factor in how quickly new pages are discovered and indexed. Many Shopify stores generate thousands of URL variations through filtered collections, paginated search results, and tag-based URLs. Blocking these low-value URLs ensures Google focuses on your canonical product and collection pages, which are the ones that actually drive revenue from organic search.
Real-World Examples
Here are practical robots.txt configurations for different types of Shopify stores, showing how the rules change based on business needs.
Example 1: Small Apparel Store (Under 500 Products)
A small clothing brand with a blog and 300 products needs basic crawl control. They want full search engine access, AI bot blocking to protect their styled product descriptions, and standard Shopify path blocking.
| Setting | Configuration |
|---|---|
| Googlebot | Allowed (full access) |
| Bingbot | Allowed (full access) |
| All other crawlers | Allowed |
| Blocked Shopify paths | /admin, /cart, /checkout, /search, /account |
| AI bots blocked | GPTBot, ClaudeBot, Google-Extended, CCBot |
| Custom rules | Disallow: /collections/*?sort_by= |
| Sitemap | https://example-apparel.com/sitemap.xml |
The custom rule blocking sorted collection views prevents search engines from crawling duplicate versions of collection pages with different sort orders, which can multiply the number of crawled URLs by 5-10x.
Example 2: Large Electronics Store (5,000+ Products)
A high-volume electronics retailer with extensive product filtering needs aggressive crawl budget management. They generate thousands of filtered URLs through price ranges, brand filters, and specification parameters.
| Setting | Configuration |
|---|---|
| Googlebot | Allowed (full access) |
| Bingbot | Allowed (full access) |
| All other crawlers | Allowed |
| Blocked Shopify paths | /admin, /cart, /checkout, /search, /account, /orders, /policies |
| AI bots blocked | All six AI bots |
| Custom rules | Disallow: /collections/*?sort_by= Disallow: /collections/*+* Disallow: /collections/*%2B* Disallow: /*?variant= |
| Sitemap | https://example-electronics.com/sitemap.xml |
This configuration blocks tag-combined collection URLs (the + and %2B patterns), sorted collections, and variant-specific URLs. For a store with 5,000 products across 200 collections, this can reduce crawlable URLs from over 100,000 to under 15,000, dramatically improving crawl efficiency.
Example 3: Content-Heavy DTC Brand with Blog
A direct-to-consumer beauty brand publishes weekly blog content, buying guides, and ingredient education pages. Their content is their competitive advantage, and they need maximum AI protection while keeping search engines fully engaged.
| Setting | Configuration |
|---|---|
| Googlebot | Allowed (full access) |
| Bingbot | Allowed (full access) |
| All other crawlers | Allowed |
| Blocked Shopify paths | /admin, /cart, /checkout, /search, /account |
| AI bots blocked | All six AI bots |
| Custom rules | Disallow: /blogs/*?page= Disallow: /collections/*?page= |
| Sitemap | https://example-beauty.com/sitemap.xml |
Blocking paginated blog and collection URLs prevents search engines from crawling page 2, 3, 4, etc. of these listings, which contain duplicate links to the same products and articles. The first page of each collection and blog category is sufficient for discovery, with the sitemap handling the rest.
Robots.txt vs Other Crawl Control Methods
Understanding when to use robots.txt versus other methods is essential for effective crawl management. Each approach has distinct strengths and limitations.
| Method | What It Does | Best For | Limitations |
|---|---|---|---|
| robots.txt | Prevents crawlers from accessing specified URLs | Blocking entire directories, AI bots, internal search pages, filtered URLs | Advisory only; does not remove already-indexed pages; cannot target specific content within a page |
| Meta noindex | Allows crawling but prevents indexing | Pages that need to be crawled (for link discovery) but not shown in search results | Uses crawl budget since the page must be fetched; if robots.txt blocks the page, Google cannot see the noindex tag |
| Canonical tags | Tells search engines which version of a page is the primary one | Duplicate content from URL parameters, filtered views, pagination | Does not prevent crawling; search engines may ignore the hint if the pages differ significantly |
| X-Robots-Tag HTTP header | Same as meta noindex but applied at the server level | Non-HTML files (PDFs, images) and pages where you cannot add meta tags | Requires server-level configuration, which is limited on Shopify |
| Nofollow links | Tells crawlers not to follow specific links | Preventing PageRank flow to low-value pages | Does not prevent crawling if the page is discoverable through other links or sitemap |
| Password protection | Requires authentication to access the page | Truly private content like wholesale pricing or member-only areas | Not suitable for public-facing content; creates friction for legitimate visitors |
For most Shopify stores, the optimal strategy combines robots.txt (to block entire path patterns like /search and /cart), canonical tags (to handle collection filtering and pagination), and sitemap inclusion (to ensure all important pages are discovered). Using all three methods together gives you comprehensive crawl control without relying on any single mechanism.
Common Mistakes to Avoid
- Blocking your sitemap URL in robots.txt. If you add a Disallow rule that accidentally covers your sitemap path, search engines cannot access it even though it is referenced in the same file. Always verify that /sitemap.xml is not blocked by any of your rules. Use Google Search Console’s robots.txt Tester to check.
- Blocking CSS and JavaScript files. Some merchants block /assets/ or other resource directories thinking it will hide their theme. This prevents Googlebot from rendering your pages correctly, causing them to appear as blank pages in Google’s index. Google has explicitly stated that blocking CSS and JS files can negatively impact rankings.
- Using robots.txt to hide sensitive information. Robots.txt is a publicly accessible file. Anyone can visit yourdomain.com/robots.txt and see exactly which paths you are blocking. If you list /pages/wholesale-pricing in your Disallow rules, you are advertising that the page exists. Use password protection for truly sensitive content.
- Forgetting to add the sitemap directive. The Sitemap line in robots.txt is one of the primary ways search engines discover your sitemap. While you can also submit it through Google Search Console and Bing Webmaster Tools, including it in robots.txt ensures every compliant crawler finds it automatically.
- Setting overly broad Disallow rules. A rule like “Disallow: /collections” blocks ALL collection pages, not just filtered versions. Be specific with your rules: use “Disallow: /collections/*?sort_by=” to target only sorted views while keeping the main collection pages crawlable.
- Not updating robots.txt after store changes. When you install a new app that creates public pages, add a new page template, or restructure your navigation, your robots.txt may need updating. A quarterly review ensures new content is properly handled.
- Assuming robots.txt blocks indexing. A common misconception is that blocking a URL in robots.txt removes it from Google. If other sites link to a blocked page, Google may still index the URL (showing a title and description inferred from external links) even though it cannot crawl the actual content. Use meta noindex together with robots.txt for complete removal.
When to Use This Tool
Different situations call for different robots.txt configurations. Use this reference to determine when and how to update your file.
| Scenario | Recommended Action | Priority |
|---|---|---|
| Launching a new Shopify store | Generate a robots.txt with standard Shopify blocks and your sitemap URL before going live | High |
| Noticing duplicate content in Google Search Console | Add Disallow rules for filtered, sorted, and paginated URL patterns | High |
| Publishing original blog content regularly | Block AI training bots to protect your content investment | Medium |
| Migrating from another platform to Shopify | Create a new robots.txt that reflects Shopify’s URL structure and block old platform paths that may still be crawled | High |
| Installing apps that create public pages | Review whether new app-generated pages should be crawled or blocked | Medium |
| Store has over 1,000 products | Optimize crawl budget by blocking low-value URL patterns like tag combinations and search results | Medium |
| Running seasonal or flash sale pages | Ensure sale pages are crawlable during the sale, then block them after to prevent stale content indexing | Low |
| Quarterly SEO audit | Re-evaluate all robots.txt rules against current site structure and crawl reports | Medium |
Tips and Best Practices
- Always include your sitemap URL. Adding a Sitemap directive helps search engines discover all your pages efficiently. Your Shopify sitemap is automatically generated at yourdomain.com/sitemap.xml and includes products, collections, pages, and blog posts.
- Do not block CSS or JavaScript files. Googlebot needs to render your pages to understand their content. Blocking access to CSS or JS files can prevent Google from seeing your pages correctly, which hurts your rankings.
- Test your robots.txt before deploying. Use Google Search Console’s robots.txt Tester to verify that your rules do not accidentally block important pages. A misplaced Disallow rule can de-index your entire product catalog.
- Remember that robots.txt is advisory, not enforcement. Well-behaved bots honor robots.txt, but malicious scrapers ignore it. For truly sensitive content, use password protection or Shopify’s built-in access controls instead.
- Review your robots.txt quarterly. As you add new pages, sections, or apps to your store, revisit your robots.txt to ensure new content is accessible to search engines and unwanted paths remain blocked.
- Use specific path patterns rather than broad blocks. Instead of blocking an entire directory, target specific URL parameters or patterns. This gives you precise control without accidentally blocking important content.
- Keep your robots.txt file organized with comments. Adding comment lines (starting with #) to explain each section makes it easier for you or your developer to understand and maintain the file over time.
Related Tools
- robots.txt AI Bot Checker – Analyze any website’s robots.txt to see which AI bots are currently blocked or allowed.
- Sitemap Checker – Validate your Shopify sitemap to ensure all important pages are included and properly formatted.
- Shopify SEO Checker – Run a comprehensive SEO audit on your Shopify store, including robots.txt validation, meta tags, and page speed.
Our Shopify Apps
Rubik Variant Images Rubik Combined ListingsSmart Bulk Image Upload Export Product Images Bulk Delete Products
How do I add a custom robots.txt to my Shopify store?
In Shopify 2.0 themes, go to Online Store, then Themes, then click Actions and Edit Code. Under Templates, look for robots.txt.liquid. If it does not exist, click Add a new template and select robots.txt. Replace its contents with the output from this generator. Save and verify by visiting yourdomain.com/robots.txt in your browser.
Will blocking AI bots affect my Google rankings?
No. AI training bots (GPTBot, ClaudeBot, Google-Extended, etc.) are separate from search engine crawlers. Blocking GPTBot does not affect how Google indexes your site, and blocking Google-Extended only prevents Google from using your content for AI training while keeping Googlebot’s search indexing fully functional.
What is the difference between robots.txt and meta noindex?
Robots.txt prevents crawlers from accessing a page. Meta noindex allows crawling but tells search engines not to include the page in search results. For pages you want completely hidden from search, use both: robots.txt to save crawl budget and meta noindex as a safety net. Note that if a page is blocked by robots.txt, Google cannot see a noindex tag on it.
Should I block the /collections/all path?
It depends. The /collections/all page shows every product in your store and can be useful for search engines to discover all your products. However, it can also create duplicate content issues if your products are already accessible through individual collection pages. If you have a large catalog with well-organized collections, blocking /collections/all can help focus crawl budget on your curated collection pages.
Can robots.txt fix duplicate content issues?
Partially. Blocking duplicate pages (like filtered collection URLs or paginated search results) from crawling prevents search engines from indexing them. However, canonical tags are the proper solution for duplicate content. Use robots.txt to block pages that should never be crawled, and canonical tags for pages that might have duplicate versions.
What happens if I accidentally block all crawlers?
If you set Disallow: / for all user agents, search engines will stop crawling and eventually de-index your entire site. Your pages will disappear from Google search results within days to weeks. If this happens, immediately fix your robots.txt and request re-indexing through Google Search Console. Recovery can take several weeks.
Should I block Shopify’s internal search from crawling?
Yes. Shopify’s internal search results pages (/search?q=…) generate an unlimited number of URLs that add no SEO value and waste crawl budget. Blocking /search is included in Shopify’s default robots.txt for this reason. Keep this block enabled unless you have a specific reason to allow search result pages to be indexed.
How do AI bots differ from regular search engine bots?
Search engine bots (Googlebot, Bingbot) crawl your site to index pages for search results. AI training bots (GPTBot, ClaudeBot, etc.) crawl your site to collect text data for training large language models. Blocking AI bots has no effect on your search visibility. You can allow search engines full access while completely blocking AI training crawlers.
Does robots.txt affect Shopify apps and integrations?
No. Robots.txt only affects external web crawlers that access your site through HTTP requests. Shopify apps, integrations, and internal processes use the Shopify API, which is not affected by robots.txt rules. Your inventory management, order processing, and analytics will continue to work normally regardless of your robots.txt configuration.
Can I use wildcards in Shopify’s robots.txt?
Yes. Shopify’s robots.txt.liquid supports standard wildcard patterns. Use * to match any sequence of characters (e.g., Disallow: /collections/*?sort_by= to block sorted collection views) and $ to indicate the end of a URL. Wildcards are useful for blocking URL parameters and filtered views without listing every possible combination.
How often does Google check my robots.txt file?
Google typically re-fetches robots.txt files approximately once every 24 hours, though high-traffic sites may see more frequent checks. After you update your robots.txt, it can take up to a day for Google to detect the changes. You can request a re-fetch through Google Search Console’s robots.txt Tester to speed up the process. Bing and other search engines follow similar schedules.
What is crawl budget and why does it matter for Shopify?
Crawl budget is the number of pages a search engine will crawl on your site within a given timeframe. Google allocates crawl budget based on your site’s perceived importance and server capacity. For Shopify stores with thousands of product variants, filtered collection URLs, and search result pages, the total number of crawlable URLs can exceed what Google will actually crawl. Optimizing robots.txt to block low-value URLs ensures Google spends its budget on your most important pages, leading to faster indexing and better visibility for new products.
Can I block specific countries or IP addresses with robots.txt?
No. Robots.txt only supports User-agent (bot name) and path-based rules. It cannot block specific countries, IP addresses, or individual users. For geographic restrictions, you need server-level firewall rules or Shopify’s built-in fraud prevention settings. Robots.txt is solely for controlling which automated crawlers can access which parts of your site.
Should I include Crawl-delay in my Shopify robots.txt?
The Crawl-delay directive is supported by Bing and Yandex but ignored by Google. It tells compliant crawlers to wait a specified number of seconds between requests. For most Shopify stores, Crawl-delay is unnecessary because Shopify’s infrastructure handles high crawl rates without performance issues. However, if you notice server slowdowns during peak crawl periods, adding “Crawl-delay: 10” for specific bots can reduce the load.
What happens to my robots.txt if I change Shopify themes?
Your robots.txt.liquid file is part of your theme, so switching themes can overwrite your custom robots.txt configuration. Before changing themes, copy your current robots.txt.liquid content and save it. After activating the new theme, check whether it includes a robots.txt.liquid template and update it with your saved rules. This is a commonly overlooked step during theme migrations that can temporarily expose blocked paths or remove AI bot protections.
