Skip to main content
Bot rules are the core of Anubis policy configuration. They define how to identify and respond to different types of traffic.

Rule Structure

A bot rule consists of matchers and an action:
bots:
  - name: amazonbot
    user_agent_regex: Amazonbot
    action: DENY

Required Fields

FieldTypeDescription
namestringUnique identifier for the rule
actionstringAction to take: ALLOW, DENY, CHALLENGE, WEIGH

Matchers

Rules must include at least one matcher:
MatcherTypeDescription
user_agent_regexregex stringMatch User-Agent header
path_regexregex stringMatch request path
headers_regexmap[string]regexMatch arbitrary headers
remote_addresses[]CIDRMatch client IP ranges
expressionCEL expressionAdvanced matching logic

Matcher Types

User Agent Matching

- name: googlebot
  user_agent_regex: "Googlebot"
  action: ALLOW

Path Matching

- name: api-endpoints
  path_regex: "^/api/.*$"
  action: ALLOW

Header Matching

- name: cloudflare-workers
  headers_regex:
    CF-Worker: ".*"
  action: DENY

IP Range Matching

- name: internal-network
  action: ALLOW
  remote_addresses:
    - 10.0.0.0/8
    - 192.168.0.0/16
    - 100.64.0.0/10

Combined Matching

Combine IP ranges with other matchers:
- name: qwantbot
  user_agent_regex: "\\+https\\://help\\.qwant\\.com/bot/"
  action: ALLOW
  remote_addresses:
    - 91.242.162.0/24

CEL Expressions

For advanced matching, use Common Expression Language (CEL) expressions:

Single Expression

- name: no-user-agent
  action: DENY
  expression: userAgent == ""

Multiple Conditions (all)

All conditions must be true:
- name: api-json-requests
  action: ALLOW
  expression:
    all:
      - '"Accept" in headers'
      - 'headers["Accept"] == "application/json"'
      - 'path.startsWith("/api/")'

Multiple Conditions (any)

At least one condition must be true:
- name: banned-ips
  action: DENY
  expression:
    any:
      - remoteAddress == "8.8.8.8"
      - remoteAddress == "1.1.1.1"

Available Variables

VariableTypeExample
remoteAddressstring"1.2.3.4"
userAgentstring"Mozilla/5.0..."
pathstring"/api/users"
methodstring"GET", "POST"
hoststring"example.com"
headersmap[string]string{"User-Agent": "..."}
querymap[string]string{"page": "1"}
contentLengthint641024
load_1mdouble2.5 (system load average)
load_5mdouble3.1
load_15mdouble2.8

DNS Functions

# Verify Forward-Confirmed Reverse DNS
- name: require-fcrdns
  action: DENY
  expression: "!verifyFCrDNS(remoteAddress)"

# Check reverse DNS pattern
- name: googlebot
  action: ALLOW
  expression:
    all:
      - 'userAgent.matches("Googlebot")'
      - 'verifyFCrDNS(remoteAddress, "\\.googlebot\\.com$")'
Available DNS functions:
  • reverseDNS(ip) - Get PTR records
  • lookupHost(hostname) - Get A/AAAA records
  • verifyFCrDNS(ip) - Verify FCrDNS
  • verifyFCrDNS(ip, pattern) - Verify FCrDNS with regex pattern
  • arpaReverseIP(ip) - Convert to ARPA notation

Helper Functions

# Check for missing headers
- name: old-chrome
  action: WEIGH
  weight:
    adjust: 10
  expression:
    all:
      - 'userAgent.matches("Chrome/[1-9][0-9]?\\.0\\.0\\.0")'
      - 'missingHeader(headers, "Sec-Ch-Ua")'

# Random behavior (use sparingly)
- name: deny-sometimes
  action: DENY
  expression: 'randInt(4) == 0'  # 25% chance

# Path segments
- name: deep-paths
  action: WEIGH
  weight:
    adjust: 5
  expression: 'size(segments(path)) > 5'

Rule Actions

ALLOW

Bypass all checks and forward to backend:
- name: health-check
  path_regex: "^/health$"
  action: ALLOW

DENY

Block with a deceptive success page:
- name: scrapers
  user_agent_regex: "(?i:scraper|crawler)"
  action: DENY

CHALLENGE

Present a proof-of-work challenge:
- name: browsers
  user_agent_regex: "Mozilla"
  action: CHALLENGE
  challenge:
    algorithm: fast
    difficulty: 2

WEIGH

Adjust request suspicion score:
# Remove suspicion
- name: session-cookie
  action: WEIGH
  expression: 'headers["Cookie"].contains("session=")'
  weight:
    adjust: -5

# Add suspicion
- name: high-load
  action: WEIGH
  expression: 'load_1m >= 10.0'
  weight:
    adjust: 20

Rule Evaluation Order

Rules are evaluated in the order they appear in the policy file. The first matching rule determines the action.
bots:
  # Specific rules first
  - name: googlebot-verified
    user_agent_regex: "Googlebot"
    expression: 'verifyFCrDNS(remoteAddress, "\\.googlebot\\.com$")'
    action: ALLOW
  
  # Generic rules last
  - name: all-bots
    user_agent_regex: "(?i:bot)"
    action: DENY

Weight-Based Rules

Weight rules accumulate. All matching WEIGH rules apply:
- name: has-session
  action: WEIGH
  expression: '"session_id" in headers["Cookie"]'
  weight:
    adjust: -10

- name: high-load
  action: WEIGH  
  expression: 'load_1m >= 8.0'
  weight:
    adjust: 15

# Final weight determines threshold action
thresholds:
  - name: low-suspicion
    expression: 'weight < 5'
    action: ALLOW
  
  - name: moderate-suspicion
    expression:
      all:
        - weight >= 5
        - weight < 15
    action: CHALLENGE
    challenge:
      algorithm: metarefresh
      difficulty: 1

Regular Expression Syntax

Anubis uses Go’s regexp package (RE2 syntax):
# Case-insensitive
user_agent_regex: "(?i:bot|crawler|scraper)"

# Character classes
path_regex: "^/[a-z0-9]+$"

# Anchors
user_agent_regex: "^curl/"  # Starts with
path_regex: "\\.json$"       # Ends with

# Escaping special characters
user_agent_regex: "example\\.com"  # Literal dot
Test regex at regex101.com (select Golang flavor).

Common Patterns

Allow Static Assets

- name: static-assets
  path_regex: "\\.(css|js|jpg|png|gif|svg|woff2?)$"
  action: ALLOW

Block Known Bad Actors

- name: scrapers
  user_agent_regex: "(?i:scraper|download|extract|harvest)"
  action: DENY

Protect POST Endpoints

- name: post-requests
  action: CHALLENGE
  expression:
    all:
      - 'method == "POST"'
      - '!verifyFCrDNS(remoteAddress)'
  challenge:
    algorithm: fast
    difficulty: 3

Dynamic Load Protection

- name: high-load-stricter
  action: WEIGH
  expression: 'load_1m >= 16.0'
  weight:
    adjust: 30

- name: low-load-lenient
  action: WEIGH
  expression: 'load_15m <= 2.0'
  weight:
    adjust: -15

Best Practices

  1. Order matters: Place specific ALLOW rules before generic DENY rules
  2. Test expressions: Use --debug-benchmark-js to test without blocking
  3. Use FCrDNS: Verify bot IP addresses with verifyFCrDNS()
  4. Prefer CHALLENGE over DENY: Legitimate users can solve challenges
  5. Monitor metrics: Track rule matches via Prometheus metrics
  6. Use weights: Build gradual suspicion instead of binary decisions

Generating Rules from robots.txt

Anubis includes the robots2policy tool to automatically convert robots.txt files into Anubis policy rules.

Usage

# Convert local robots.txt file
robots2policy -input robots.txt -output policy.yaml

# Convert from URL
robots2policy -input https://example.com/robots.txt -format json

# Read from stdin
curl https://example.com/robots.txt | robots2policy -input -

Options

FlagDefaultDescription
-input(required)Path to robots.txt file, URL, or - for stdin
-outputstdoutOutput file path or - for stdout
-formatyamlOutput format: yaml or json
-actionCHALLENGEDefault action for disallowed paths
-deny-user-agentsDENYAction for blocked user agents
-namerobots-txt-policyName for the generated policy
-crawl-delay-weight0Weight adjustment based on crawl-delay

Example Output

Input robots.txt:
User-agent: GPTBot
Disallow: /

User-agent: *
Crawl-delay: 10
Disallow: /admin/
Disallow: /api/private/
Generated policy:
bots:
  - name: robots-txt-gptbot
    action: DENY
    expression:
      all:
        - user_agent.contains("GPTBot")
  
  - name: robots-txt-admin
    action: CHALLENGE
    expression:
      all:
        - path.startsWith("/admin/")
  
  - name: robots-txt-api-private
    action: CHALLENGE
    expression:
      all:
        - path.startsWith("/api/private/")

Next Steps

Build docs developers (and LLMs) love