AI Crawler Control

Overview

AI Crawler Control helps you manage how AI-powered bots (PerplexityBot, GPTBot, CCBot, anthropic-ai) access your content. Generate robots.txt rules and HTTP header snippets to allow or block AI crawlers.

Blocking AI crawlers prevents them from training on your content but also removes you from their answer engines. This is not recommended for most sites seeking AI visibility.

Supported AI Crawlers

PerplexityBot

Used by: Perplexity AICrawls content for Perplexity’s answer engine

GPTBot

Used by: OpenAI (ChatGPT)Collects training data for GPT models

CCBot

Used by: Common CrawlArchives web content; used by multiple AI systems

anthropic-ai

Used by: Anthropic (Claude)Gathers data for Claude AI training

Configuration

Access Crawler Settings

Navigate to Settings → GEO AI → Crawlers & Robots

Select Bots to Block

Check the boxes for AI crawlers you want to block:

PerplexityBot
GPTBot (ChatGPT)
CCBot (Common Crawl)
anthropic-ai

View Generated Rules

Scroll down to see the generated robots.txt rules

Copy Rules

Copy the suggested rules to your robots.txt file

GEO AI does not write to your robots.txt file automatically. You must manually add the rules to your server.

Implementation Details

Admin Interface

includes/class-geoai-admin.php

private function render_crawlers_tab() {
    $prefs = get_option( 'geoai_crawler_prefs', array() );
    ?>
    <h2><?php esc_html_e( 'AI Crawler Controls', 'geo-ai' ); ?></h2>
    <p class="description">
        <?php esc_html_e( 'These settings generate suggested robots.txt rules. Blocking is not guaranteed as crawlers may not respect these directives.', 'geo-ai' ); ?>
    </p>
    <table class="form-table">
        <tr>
            <th scope="row"><?php esc_html_e( 'Block Crawlers', 'geo-ai' ); ?></th>
            <td>
                <label>
                    <input type="checkbox" name="geoai_crawler_prefs[block_perplexity]" value="1" 
                           <?php checked( $prefs['block_perplexity'] ?? false, true ); ?> /> 
                    PerplexityBot
                </label><br/>
                <label>
                    <input type="checkbox" name="geoai_crawler_prefs[block_gptbot]" value="1" 
                           <?php checked( $prefs['block_gptbot'] ?? false, true ); ?> /> 
                    GPTBot (ChatGPT)
                </label><br/>
                <label>
                    <input type="checkbox" name="geoai_crawler_prefs[block_ccbot]" value="1" 
                           <?php checked( $prefs['block_ccbot'] ?? false, true ); ?> /> 
                    CCBot (Common Crawl)
                </label><br/>
                <label>
                    <input type="checkbox" name="geoai_crawler_prefs[block_anthropic]" value="1" 
                           <?php checked( $prefs['block_anthropic'] ?? false, true ); ?> /> 
                    anthropic-ai
                </label>
            </td>
        </tr>
    </table>
    <?php $this->render_robots_preview( $prefs ); ?>
    <?php
}

Robots.txt Preview Generation

includes/class-geoai-admin.php

private function render_robots_preview( $prefs ) {
    $rules = array();
    if ( ! empty( $prefs['block_perplexity'] ) ) {
        $rules[] = "User-agent: PerplexityBot\nDisallow: /";
    }
    if ( ! empty( $prefs['block_gptbot'] ) ) {
        $rules[] = "User-agent: GPTBot\nDisallow: /";
    }
    if ( ! empty( $prefs['block_ccbot'] ) ) {
        $rules[] = "User-agent: CCBot\nDisallow: /";
    }
    if ( ! empty( $prefs['block_anthropic'] ) ) {
        $rules[] = "User-agent: anthropic-ai\nDisallow: /";
    }

    if ( ! empty( $rules ) ) {
        ?>
        <h3><?php esc_html_e( 'Suggested robots.txt Rules', 'geo-ai' ); ?></h3>
        <textarea readonly rows="10" class="large-text code">
            <?php echo esc_textarea( implode( "\n\n", $rules ) ); ?>
        </textarea>
        <p class="description">
            <?php esc_html_e( 'Copy these rules to your robots.txt file. GEO AI does not write server files.', 'geo-ai' ); ?>
        </p>
        <?php
    }
}

Generated Rules Example

When blocking all AI crawlers, GEO AI generates:

robots.txt

User-agent: PerplexityBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

Adding Rules to robots.txt

Manual Edit
WordPress Plugin
Create New File

Access robots.txt

Connect to your server via FTP or file manager

Locate File

Find robots.txt in your WordPress root directory (same level as wp-config.php)

Edit File

Open robots.txt in a text editor

Paste Rules

Add the generated rules from GEO AI settings

Save

Save the file and upload if using FTP

Test

Visit https://yoursite.com/robots.txt to verify

If robots.txt doesn’t exist:

robots.txt

# Standard WordPress robots.txt
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

# Block AI Crawlers (from GEO AI)
User-agent: PerplexityBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

# Sitemap
Sitemap: https://yoursite.com/sitemap.xml

Upload to WordPress root directory.

Selective Blocking

Block crawlers from specific sections only:

robots.txt

# Block AI bots from premium content only
User-agent: PerplexityBot
Disallow: /members/
Disallow: /premium/

User-agent: GPTBot
Disallow: /members/
Disallow: /premium/

# Allow AI bots on blog for visibility
User-agent: PerplexityBot
Allow: /blog/

User-agent: GPTBot
Allow: /blog/

HTTP Headers (Advanced)

For more control, block AI crawlers with HTTP headers:

Apache (.htaccess)
Nginx
PHP (functions.php)

.htaccess

# Block AI crawlers via HTTP headers
<IfModule mod_setenvif.c>
    SetEnvIfNoCase User-Agent "PerplexityBot" block_bot
    SetEnvIfNoCase User-Agent "GPTBot" block_bot
    SetEnvIfNoCase User-Agent "CCBot" block_bot
    SetEnvIfNoCase User-Agent "anthropic-ai" block_bot
</IfModule>

<IfModule mod_rewrite.c>
    RewriteCond %{ENV:block_bot} ^1$
    RewriteRule .* - [F,L]
</IfModule>

nginx.conf

# Block AI crawlers
if ($http_user_agent ~* "PerplexityBot|GPTBot|CCBot|anthropic-ai") {
    return 403;
}

functions.php

add_action('init', 'block_ai_crawlers');

function block_ai_crawlers() {
    $user_agent = $_SERVER['HTTP_USER_AGENT'] ?? '';
    
    $blocked_bots = [
        'PerplexityBot',
        'GPTBot',
        'CCBot',
        'anthropic-ai',
    ];
    
    foreach ($blocked_bots as $bot) {
        if (stripos($user_agent, $bot) !== false) {
            header('HTTP/1.1 403 Forbidden');
            exit('AI crawler blocked');
        }
    }
}

Verification

Verify your robots.txt is working:

Visit robots.txt

Go to https://yoursite.com/robots.txt in a browser

Check Rules

Verify your AI crawler rules appear correctly

Test with Google

Use Google Search Console Robots Testing Tool to validate syntax

Monitor Logs

Check server logs for blocked crawler requests (optional)

Important Caveats

Critical: AI crawlers may NOT respect robots.txt directives. Blocking is a request, not enforcement.

Not Guaranteed

Crawlers can ignore robots.txt. Only honest bots comply.

Reduces AI Visibility

Blocking prevents your content from appearing in AI answer engines.

May Not Stop Training

Some AI models may already have your content from past crawls.

No Legal Protection

robots.txt is a suggestion, not a legally binding restriction.

When to Block AI Crawlers

Premium/Paywalled Content

Consider blocking AI crawlers from accessing content behind paywalls or membership areas.

User-agent: GPTBot
Disallow: /members/
Disallow: /premium/

Proprietary Research/Data

Consider blocking if you have unique research or data you don’t want in AI training sets.

E-commerce Product Descriptions

Consider blocking to prevent AI from generating competing product descriptions.

Legal/Compliance Requirements

Block if required by industry regulations or contractual obligations.

When to Allow AI Crawlers

Marketing Content

Allow crawlers to increase visibility in AI answer engines.

Blog Posts

Get free exposure through AI-powered search results.

Educational Content

Help AI systems provide accurate information.

Public Information

Content meant to be widely accessible benefits from AI indexing.

Alternative Approaches

AI-TXT Standard

Some organizations are developing ai.txt specification:

ai.txt

# Allow AI training and indexing
User-agent: *
Allow: /

# Exceptions
Disallow: /private/
Disallow: /members/

# Attribution
Contact: [email protected]
Terms: https://example.com/ai-usage-terms

ai.txt is not yet a widely adopted standard. Most crawlers still use robots.txt.

Monitoring Crawler Activity

Track AI crawler visits in server logs:

# Apache access.log
grep -i "perplexitybot\|gptbot\|ccbot\|anthropic-ai" /var/log/apache2/access.log

# Count by bot
grep -i "perplexitybot" /var/log/apache2/access.log | wc -l

Best Practices

Default: Allow

Unless you have specific reasons, allow AI crawlers for better visibility.

Be Selective

Block crawlers only from sensitive sections, not entire site.

Monitor Impact

Track referral traffic from AI engines before/after blocking.

Understand Limitations

Remember robots.txt is advisory, not enforceable.

Document Decisions

Keep notes on why you blocked certain crawlers.

Review Regularly

Re-evaluate your blocking strategy quarterly.

Troubleshooting

Rules not working

Check:

Rules are correctly added to robots.txt in root directory
Syntax is correct (case-sensitive user-agent names)
File is accessible at yoursite.com/robots.txt
No caching plugin serving old robots.txt

Crawlers still visiting

Remember:

Crawlers may ignore robots.txt (not enforceable)
Check user agent string in logs to confirm bot identity
Consider HTTP header blocking for stricter control

robots.txt not found

Solution:

Create robots.txt file in WordPress root directory
Ensure file has correct permissions (644)
Clear any caching plugins

AI Audit

Optimize content for AI visibility

Sitemaps

Help crawlers find your content

Meta Tags

Robots meta for indexing control

Get Started

Core Features

Advanced

Developer Guide

​Overview

​Supported AI Crawlers

PerplexityBot

GPTBot

CCBot

anthropic-ai

​Configuration

​Implementation Details

​Admin Interface

​Robots.txt Preview Generation

​Generated Rules Example

​Adding Rules to robots.txt

​Selective Blocking

​HTTP Headers (Advanced)

​Verification

​Important Caveats

Not Guaranteed

Reduces AI Visibility

May Not Stop Training

No Legal Protection

​When to Block AI Crawlers

​When to Allow AI Crawlers

Marketing Content

Blog Posts

Educational Content

Public Information

​Alternative Approaches

​AI-TXT Standard

​Monitoring Crawler Activity

​Best Practices

Default: Allow

Be Selective

Monitor Impact

Understand Limitations

Document Decisions

Review Regularly

​Troubleshooting

​Related Features

AI Audit

Sitemaps

Meta Tags

Build docs developers (and LLMs) love

Overview

Supported AI Crawlers

Configuration

Implementation Details

Admin Interface

Robots.txt Preview Generation

Generated Rules Example

Adding Rules to robots.txt

Selective Blocking

HTTP Headers (Advanced)

Verification

Important Caveats

When to Block AI Crawlers

When to Allow AI Crawlers

Alternative Approaches

AI-TXT Standard

Monitoring Crawler Activity

Best Practices

Troubleshooting

Related Features