Skip to main content

Overview

AI Crawler Control helps you manage how AI-powered bots (PerplexityBot, GPTBot, CCBot, anthropic-ai) access your content. Generate robots.txt rules and HTTP header snippets to allow or block AI crawlers.
Blocking AI crawlers prevents them from training on your content but also removes you from their answer engines. This is not recommended for most sites seeking AI visibility.

Supported AI Crawlers

PerplexityBot

Used by: Perplexity AICrawls content for Perplexity’s answer engine

GPTBot

Used by: OpenAI (ChatGPT)Collects training data for GPT models

CCBot

Used by: Common CrawlArchives web content; used by multiple AI systems

anthropic-ai

Used by: Anthropic (Claude)Gathers data for Claude AI training

Configuration

1

Access Crawler Settings

Navigate to Settings → GEO AI → Crawlers & Robots
2

Select Bots to Block

Check the boxes for AI crawlers you want to block:
  • PerplexityBot
  • GPTBot (ChatGPT)
  • CCBot (Common Crawl)
  • anthropic-ai
3

View Generated Rules

Scroll down to see the generated robots.txt rules
4

Copy Rules

Copy the suggested rules to your robots.txt file
GEO AI does not write to your robots.txt file automatically. You must manually add the rules to your server.

Implementation Details

Admin Interface

includes/class-geoai-admin.php
private function render_crawlers_tab() {
    $prefs = get_option( 'geoai_crawler_prefs', array() );
    ?>
    <h2><?php esc_html_e( 'AI Crawler Controls', 'geo-ai' ); ?></h2>
    <p class="description">
        <?php esc_html_e( 'These settings generate suggested robots.txt rules. Blocking is not guaranteed as crawlers may not respect these directives.', 'geo-ai' ); ?>
    </p>
    <table class="form-table">
        <tr>
            <th scope="row"><?php esc_html_e( 'Block Crawlers', 'geo-ai' ); ?></th>
            <td>
                <label>
                    <input type="checkbox" name="geoai_crawler_prefs[block_perplexity]" value="1" 
                           <?php checked( $prefs['block_perplexity'] ?? false, true ); ?> /> 
                    PerplexityBot
                </label><br/>
                <label>
                    <input type="checkbox" name="geoai_crawler_prefs[block_gptbot]" value="1" 
                           <?php checked( $prefs['block_gptbot'] ?? false, true ); ?> /> 
                    GPTBot (ChatGPT)
                </label><br/>
                <label>
                    <input type="checkbox" name="geoai_crawler_prefs[block_ccbot]" value="1" 
                           <?php checked( $prefs['block_ccbot'] ?? false, true ); ?> /> 
                    CCBot (Common Crawl)
                </label><br/>
                <label>
                    <input type="checkbox" name="geoai_crawler_prefs[block_anthropic]" value="1" 
                           <?php checked( $prefs['block_anthropic'] ?? false, true ); ?> /> 
                    anthropic-ai
                </label>
            </td>
        </tr>
    </table>
    <?php $this->render_robots_preview( $prefs ); ?>
    <?php
}

Robots.txt Preview Generation

includes/class-geoai-admin.php
private function render_robots_preview( $prefs ) {
    $rules = array();
    if ( ! empty( $prefs['block_perplexity'] ) ) {
        $rules[] = "User-agent: PerplexityBot\nDisallow: /";
    }
    if ( ! empty( $prefs['block_gptbot'] ) ) {
        $rules[] = "User-agent: GPTBot\nDisallow: /";
    }
    if ( ! empty( $prefs['block_ccbot'] ) ) {
        $rules[] = "User-agent: CCBot\nDisallow: /";
    }
    if ( ! empty( $prefs['block_anthropic'] ) ) {
        $rules[] = "User-agent: anthropic-ai\nDisallow: /";
    }

    if ( ! empty( $rules ) ) {
        ?>
        <h3><?php esc_html_e( 'Suggested robots.txt Rules', 'geo-ai' ); ?></h3>
        <textarea readonly rows="10" class="large-text code">
            <?php echo esc_textarea( implode( "\n\n", $rules ) ); ?>
        </textarea>
        <p class="description">
            <?php esc_html_e( 'Copy these rules to your robots.txt file. GEO AI does not write server files.', 'geo-ai' ); ?>
        </p>
        <?php
    }
}

Generated Rules Example

When blocking all AI crawlers, GEO AI generates:
robots.txt
User-agent: PerplexityBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

Adding Rules to robots.txt

1

Access robots.txt

Connect to your server via FTP or file manager
2

Locate File

Find robots.txt in your WordPress root directory (same level as wp-config.php)
3

Edit File

Open robots.txt in a text editor
4

Paste Rules

Add the generated rules from GEO AI settings
5

Save

Save the file and upload if using FTP
6

Test

Visit https://yoursite.com/robots.txt to verify

Selective Blocking

Block crawlers from specific sections only:
robots.txt
# Block AI bots from premium content only
User-agent: PerplexityBot
Disallow: /members/
Disallow: /premium/

User-agent: GPTBot
Disallow: /members/
Disallow: /premium/

# Allow AI bots on blog for visibility
User-agent: PerplexityBot
Allow: /blog/

User-agent: GPTBot
Allow: /blog/

HTTP Headers (Advanced)

For more control, block AI crawlers with HTTP headers:
.htaccess
# Block AI crawlers via HTTP headers
<IfModule mod_setenvif.c>
    SetEnvIfNoCase User-Agent "PerplexityBot" block_bot
    SetEnvIfNoCase User-Agent "GPTBot" block_bot
    SetEnvIfNoCase User-Agent "CCBot" block_bot
    SetEnvIfNoCase User-Agent "anthropic-ai" block_bot
</IfModule>

<IfModule mod_rewrite.c>
    RewriteCond %{ENV:block_bot} ^1$
    RewriteRule .* - [F,L]
</IfModule>

Verification

Verify your robots.txt is working:
1

Visit robots.txt

Go to https://yoursite.com/robots.txt in a browser
2

Check Rules

Verify your AI crawler rules appear correctly
3

Test with Google

4

Monitor Logs

Check server logs for blocked crawler requests (optional)

Important Caveats

Critical: AI crawlers may NOT respect robots.txt directives. Blocking is a request, not enforcement.

Not Guaranteed

Crawlers can ignore robots.txt. Only honest bots comply.

Reduces AI Visibility

Blocking prevents your content from appearing in AI answer engines.

May Not Stop Training

Some AI models may already have your content from past crawls.

No Legal Protection

robots.txt is a suggestion, not a legally binding restriction.

When to Block AI Crawlers

Consider blocking AI crawlers from accessing content behind paywalls or membership areas.
User-agent: GPTBot
Disallow: /members/
Disallow: /premium/
Consider blocking if you have unique research or data you don’t want in AI training sets.
Consider blocking to prevent AI from generating competing product descriptions.

When to Allow AI Crawlers

Marketing Content

Allow crawlers to increase visibility in AI answer engines.

Blog Posts

Get free exposure through AI-powered search results.

Educational Content

Help AI systems provide accurate information.

Public Information

Content meant to be widely accessible benefits from AI indexing.

Alternative Approaches

AI-TXT Standard

Some organizations are developing ai.txt specification:
ai.txt
# Allow AI training and indexing
User-agent: *
Allow: /

# Exceptions
Disallow: /private/
Disallow: /members/

# Attribution
Contact: [email protected]
Terms: https://example.com/ai-usage-terms
ai.txt is not yet a widely adopted standard. Most crawlers still use robots.txt.

Monitoring Crawler Activity

Track AI crawler visits in server logs:
# Apache access.log
grep -i "perplexitybot\|gptbot\|ccbot\|anthropic-ai" /var/log/apache2/access.log

# Count by bot
grep -i "perplexitybot" /var/log/apache2/access.log | wc -l

Best Practices

Default: Allow

Unless you have specific reasons, allow AI crawlers for better visibility.

Be Selective

Block crawlers only from sensitive sections, not entire site.

Monitor Impact

Track referral traffic from AI engines before/after blocking.

Understand Limitations

Remember robots.txt is advisory, not enforceable.

Document Decisions

Keep notes on why you blocked certain crawlers.

Review Regularly

Re-evaluate your blocking strategy quarterly.

Troubleshooting

Check:
  • Rules are correctly added to robots.txt in root directory
  • Syntax is correct (case-sensitive user-agent names)
  • File is accessible at yoursite.com/robots.txt
  • No caching plugin serving old robots.txt
Remember:
  • Crawlers may ignore robots.txt (not enforceable)
  • Check user agent string in logs to confirm bot identity
  • Consider HTTP header blocking for stricter control
Solution:
  • Create robots.txt file in WordPress root directory
  • Ensure file has correct permissions (644)
  • Clear any caching plugins

AI Audit

Optimize content for AI visibility

Sitemaps

Help crawlers find your content

Meta Tags

Robots meta for indexing control

Build docs developers (and LLMs) love