Extractors

Extractors allow you to extract and reuse data from protocol responses. They can extract values using regex, JSON paths, XPath queries, key-value pairs, or DSL expressions. Extracted data can be used in subsequent requests or displayed in output.

Extractor types

Regex
KVal
JSON
XPath
DSL

Extract data using regular expressions with optional capture groups.

extractors:
  - type: regex
    name: token
    regex:
      - "access_token\\$production\\$[0-9a-z]{16}\\$[0-9a-f]{32}"
      - "Author:(?:[A-Za-z0-9 -\\_=\"]+)?<span(?:[A-Za-z0-9 -\\_=\"]+)?>([A-Za-z0-9]+)<\\/span>"
    group: 1  # Extract first capture group

Go regex engine does not support lookaheads or lookbehinds.

Extract key-value pairs from headers and cookies (case-insensitive).

extractors:
  - type: kval
    name: session
    kval:
      - "server"        # Extract Server header
      - "phpsessid"     # Extract PHPSESSID cookie
      - "content_type"  # Extract Content-Type (note: use _ instead of -)

KVal inputs are case-insensitive and do not support dashes (-). Replace dashes with underscores (_). Example: Content-Type becomes content_type

Extract data using jq-style JSON queries.

extractors:
  - type: json
    name: ids
    json:
      - ".[] | .id"
      - ".batters | .batter | .[] | .id"

Extract data using XPath queries from HTML/XML responses.

extractors:
  - type: xpath
    name: links
    xpath:
      - "/html/body/div/p[2]/a"
    attribute: href  # Extract href attribute

Extract data using Domain Specific Language expressions.

extractors:
  - type: dsl
    name: info
    dsl:
      - "'Server: ' + header['Server']"
      - "status_code + ' - ' + content_length"

Extractor options

name

string

required

Name of the extractor. Used to reference extracted values. Must be lowercase without spaces or underscores.

extractors:
  - type: regex
    name: "api-token"
    regex:
      - "token=([a-zA-Z0-9]+)"

part

string

default:"body"

Part of the response to extract from. Each protocol exposes different parts.

extractors:
  - type: kval
    part: header
    kval:
      - "set-cookie"

internal

boolean

default:"false"

When true, extracted values can be used in subsequent requests but won’t appear in output.

extractors:
  - type: regex
    name: csrf
    internal: true
    regex:
      - "csrf_token:\\s*([a-zA-Z0-9]+)"

group

integer

default:"0"

Regex capture group to extract. Use 0 for full match, 1+ for specific groups.

extractors:
  - type: regex
    name: version
    group: 1
    regex:
      - "Version:\\s*v?([0-9.]+)"

attribute

string

XPath attribute to extract from matched elements.

extractors:
  - type: xpath
    name: image-urls
    xpath:
      - "//img"
    attribute: src

case-insensitive

boolean

default:"false"

Enable case-insensitive extraction for regex extractors.

extractors:
  - type: regex
    case-insensitive: true
    regex:
      - "(?i)error"

Internal extractors

Internal extractors are crucial for multi-step templates. They extract data from one request and make it available to subsequent requests:

http:
  - raw:
      - |
        POST /login HTTP/1.1
        Host: {{Hostname}}
        
        username=admin&password=test
      - |
        GET /api/data?token={{token}} HTTP/1.1
        Host: {{Hostname}}
    
    extractors:
      - type: regex
        name: token
        internal: true
        group: 1
        regex:
          - "Token: '([A-Za-z0-9]+)'"

Chaining requests with extractors

Extract data and use it in subsequent requests:

id: extract-and-iterate

info:
  name: Extract Emails and Check
  author: pdteam
  severity: info

flow: |
  http(1)
  for (let email of template["emails"]) {
    set("email", email);
    http(2);
  }

http:
  - method: GET
    path:
      - "{{BaseURL}}"
    
    extractors:
      - type: regex
        name: emails
        internal: true
        regex:
          - "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
  
  - method: GET
    path:
      - "{{BaseURL}}/user/{{base64(email)}}"
    
    matchers:
      - type: word
        words:
          - "Welcome"

Protocol-specific parts

HTTP
DNS
Network

body - Response body (default)
header - Response headers
raw - Raw HTTP response
request - HTTP request
all - Body + headers
cookies_from_response - Cookies in name:value format
headers_from_response - Headers in name:value format

raw - Raw DNS response (default)
answer - DNS answer field
question - DNS question field
ns - DNS nameserver field
extra - DNS extra field

raw - Raw network response (default)
data - Response data

Real-world examples

id: aws-token-extract

info:
  name: Extract AWS Tokens
  author: pdteam
  severity: info

file:
  - extensions:
      - all
    
    extractors:
      - type: regex
        name: aws-keys
        regex:
          - "AKIA[0-9A-Z]{16}"
          - "amzn\\.mws\\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}"

Extractor output

By default, non-internal extractors display their results in the output:

[server-info] [http] [info] https://example.com
  [server-info] nginx/1.18.0
  [x_powered_by] PHP/7.4.3
  [regex] nginx/1.18.0
  [dsl] Server: nginx/1.18.0 | Status: 200

Internal extractors are only available in template variables and don’t appear in output.

Best practices

Use internal extractors for chaining - Mark extractors as internal when their values are only needed in subsequent requests
Name extractors descriptively - Use clear names that indicate what data is being extracted
Extract minimal data - Only extract the data you need to reduce memory usage
Use capture groups - For regex extractors, use capture groups to extract specific parts
Validate extracted data - Use matchers to verify extracted data meets expected format
Combine extractor types - Use multiple extractor types for different data formats

Common patterns

Extract and validate

extractors:
  - type: regex
    name: version
    group: 1
    regex:
      - "Version:\\s*([0-9.]+)"

matchers:
  - type: regex
    regex:
      - "[0-9]+\\.[0-9]+\\.[0-9]+"  # Validate semver format

Extract multiple values

extractors:
  - type: regex
    name: endpoints
    regex:
      - "/api/v[0-9]+/[a-z]+"

Extract and transform

extractors:
  - type: dsl
    name: normalized-domain
    dsl:
      - "to_lower(trim_suffix(cname, '.'))"

Matchers

Pattern matching

Variables

Dynamic values

Helper Functions

DSL helper functions

Flow Control

Conditional execution

Getting Started with Templates

Protocol Types

Template Features

Template Best Practices

Extractor types

Extractor options

Internal extractors

Chaining requests with extractors

Protocol-specific parts

Real-world examples

Extractor output

Best practices

Common patterns

Extract and validate

Extract multiple values

Extract and transform

Matchers

Variables

Helper Functions

Flow Control

Build docs developers (and LLMs) love

Getting Started with Templates

Protocol Types

Template Features

Template Best Practices

​Extractor types

​Extractor options

​Internal extractors

​Chaining requests with extractors

​Protocol-specific parts

​Real-world examples

​Extractor output

​Best practices

​Common patterns

​Extract and validate

​Extract multiple values

​Extract and transform

​Related

Matchers

Variables

Helper Functions

Flow Control

Build docs developers (and LLMs) love

Extractor types

Extractor options

Internal extractors

Chaining requests with extractors

Protocol-specific parts

Real-world examples

Extractor output

Best practices

Common patterns

Extract and validate

Extract multiple values

Extract and transform

Related