Robots.txt Examples
Learn how robots.txt controls search engine crawling. Study basic and advanced examples, then generate or fetch a robots.txt for any website.
What Is a Robots.txt File?
A robots.txt file tells search engine crawlers (like Googlebot, Bingbot) which pages or sections of your website they are allowed or not allowed to crawl. It is placed at the root of your domain: yourwebsite.com/robots.txt.
Block crawlers from accessing private or irrelevant sections
Focus crawler resources on your most important pages
Follows the Robots Exclusion Protocol recognized by all major engines
Point crawlers to your sitemap for better URL discovery
Basic Robots.txt Examples
A basic robots.txt uses User-agent to target crawlers and Disallow to block paths. An empty Disallow: means everything is allowed.
Allow All Crawlers
# Generated by CheckSEO (https://checkseo.in/robots-txt-generator.html) User-agent: * Disallow: Sitemap: https://example.com/sitemap.xml
Block All Crawlers
# Generated by CheckSEO (https://checkseo.in/robots-txt-generator.html) User-agent: * Disallow: /
Block Specific Directories
# Generated by CheckSEO (https://checkseo.in/robots-txt-generator.html) User-agent: * Disallow: /admin/ Disallow: /private/ Disallow: /tmp/ Allow: / Sitemap: https://example.com/sitemap.xml
Directive Reference
| Directive | Required | Description |
|---|---|---|
User-agent | Yes | Which crawler the rules apply to. Use * for all crawlers. |
Disallow | Yes | Path or prefix to block. Empty value means nothing is blocked. |
Allow | No | Override a broader Disallow for a specific sub-path. |
Sitemap | No | Full URL of your XML sitemap. Can appear multiple times. |
Advanced Robots.txt Examples
Advanced robots.txt files use multiple user-agent blocks, wildcards, Crawl-delay, and specific rules for different bots.
Multiple User-Agents with Different Rules
# Generated by CheckSEO (https://checkseo.in/robots-txt-generator.html) # Rules for all crawlers User-agent: * Disallow: /admin/ Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /search? Allow: / # Googlebot gets full access User-agent: Googlebot Disallow: # Block AI training bots User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: CCBot Disallow: / User-agent: anthropic-ai Disallow: / Sitemap: https://example.com/sitemap.xml
E-Commerce Robots.txt
# Generated by CheckSEO (https://checkseo.in/robots-txt-generator.html) User-agent: * Disallow: /cart/ Disallow: /checkout/ Disallow: /account/ Disallow: /wishlist/ Disallow: /search? Disallow: /*?sort= Disallow: /*?filter= Disallow: /*?page= Allow: /products/ Allow: /categories/ Allow: /blog/ Allow: / Crawl-delay: 1 Sitemap: https://example.com/sitemap.xml Sitemap: https://example.com/sitemap-products.xml Sitemap: https://example.com/sitemap-blog.xml
WordPress Robots.txt
# Generated by CheckSEO (https://checkseo.in/robots-txt-generator.html) User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-content/cache/ Disallow: /trackback/ Disallow: /feed/ Disallow: /*?replytocom= Disallow: /*?s= Allow: /wp-admin/admin-ajax.php Allow: /wp-content/uploads/ Allow: / Sitemap: https://example.com/sitemap_index.xml
Wildcard Patterns
Google and Bing support * (match any sequence) and $ (match end of URL) in Disallow and Allow directives.
# Generated by CheckSEO (https://checkseo.in/robots-txt-generator.html) User-agent: * # Block all PDF files Disallow: /*.pdf$ # Block all URLs with query parameters Disallow: /*? # Block all URLs containing /print/ Disallow: /*/print/ # Allow specific file types Allow: /*.js$ Allow: /*.css$ Allow: /*.png$ Allow: /*.jpg$ Sitemap: https://example.com/sitemap.xml
Advanced Directive Reference
| Directive | Support | Description |
|---|---|---|
Crawl-delay | Partial | Seconds between requests. Supported by Bing/Yandex but ignored by Google. |
* (wildcard) | Google/Bing | Match any sequence of characters in Allow/Disallow paths. |
$ (end match) | Google/Bing | Match end of URL. Example: /*.pdf$ blocks all PDFs. |
Multiple Sitemap | All | Declare multiple sitemap files. Place outside User-agent blocks. |
Google Robots.txt Guidelines
- Place robots.txt at the root of your domain
- Use UTF-8 encoding
- Keep rules simple and test with Google's robots.txt tester
- Declare your sitemap URL in robots.txt
- Use Allow to override broader Disallow rules
- Use specific User-agent when rules differ by bot
- Use robots.txt as a security tool (it doesn't hide content)
- Block CSS/JS files that Google needs to render pages
- Block pages you want indexed (use noindex meta tag instead)
- Forget that Disallow only prevents crawling, not indexing
- Set Crawl-delay too high (it slows discovery)
- Use robots.txt to remove pages from search results
Fetch Robots.txt from Website
Enter any website URL to fetch its current robots.txt file. If no robots.txt exists, we will generate a recommended default for you.
Build Robots.txt
Use this visual builder to create a custom robots.txt file. Fill in the fields and download your file.
Want to Monitor Your Robots.txt?
CheckSEO validates your robots.txt automatically, tracks directive changes, detects crawl-blocking risks, and alerts you when robots.txt issues affect your search visibility. Go beyond static files with continuous robots.txt monitoring.
Related SEO Tools
Use these tools together for complete crawl and indexing control.