Skip to main content
Extractors allow you to extract and reuse data from protocol responses. They can extract values using regex, JSON paths, XPath queries, key-value pairs, or DSL expressions. Extracted data can be used in subsequent requests or displayed in output.

Extractor types

Extract data using regular expressions with optional capture groups.
extractors:
  - type: regex
    name: token
    regex:
      - "access_token\\$production\\$[0-9a-z]{16}\\$[0-9a-f]{32}"
      - "Author:(?:[A-Za-z0-9 -\\_=\"]+)?<span(?:[A-Za-z0-9 -\\_=\"]+)?>([A-Za-z0-9]+)<\\/span>"
    group: 1  # Extract first capture group
Go regex engine does not support lookaheads or lookbehinds.

Extractor options

name
string
required
Name of the extractor. Used to reference extracted values. Must be lowercase without spaces or underscores.
extractors:
  - type: regex
    name: "api-token"
    regex:
      - "token=([a-zA-Z0-9]+)"
part
string
default:"body"
Part of the response to extract from. Each protocol exposes different parts.
extractors:
  - type: kval
    part: header
    kval:
      - "set-cookie"
internal
boolean
default:"false"
When true, extracted values can be used in subsequent requests but won’t appear in output.
extractors:
  - type: regex
    name: csrf
    internal: true
    regex:
      - "csrf_token:\\s*([a-zA-Z0-9]+)"
group
integer
default:"0"
Regex capture group to extract. Use 0 for full match, 1+ for specific groups.
extractors:
  - type: regex
    name: version
    group: 1
    regex:
      - "Version:\\s*v?([0-9.]+)"
attribute
string
XPath attribute to extract from matched elements.
extractors:
  - type: xpath
    name: image-urls
    xpath:
      - "//img"
    attribute: src
case-insensitive
boolean
default:"false"
Enable case-insensitive extraction for regex extractors.
extractors:
  - type: regex
    case-insensitive: true
    regex:
      - "(?i)error"

Internal extractors

Internal extractors are crucial for multi-step templates. They extract data from one request and make it available to subsequent requests:
http:
  - raw:
      - |
        POST /login HTTP/1.1
        Host: {{Hostname}}
        
        username=admin&password=test
      - |
        GET /api/data?token={{token}} HTTP/1.1
        Host: {{Hostname}}
    
    extractors:
      - type: regex
        name: token
        internal: true
        group: 1
        regex:
          - "Token: '([A-Za-z0-9]+)'"

Chaining requests with extractors

Extract data and use it in subsequent requests:
id: extract-and-iterate

info:
  name: Extract Emails and Check
  author: pdteam
  severity: info

flow: |
  http(1)
  for (let email of template["emails"]) {
    set("email", email);
    http(2);
  }

http:
  - method: GET
    path:
      - "{{BaseURL}}"
    
    extractors:
      - type: regex
        name: emails
        internal: true
        regex:
          - "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
  
  - method: GET
    path:
      - "{{BaseURL}}/user/{{base64(email)}}"
    
    matchers:
      - type: word
        words:
          - "Welcome"

Protocol-specific parts

  • body - Response body (default)
  • header - Response headers
  • raw - Raw HTTP response
  • request - HTTP request
  • all - Body + headers
  • cookies_from_response - Cookies in name:value format
  • headers_from_response - Headers in name:value format

Real-world examples

id: aws-token-extract

info:
  name: Extract AWS Tokens
  author: pdteam
  severity: info

file:
  - extensions:
      - all
    
    extractors:
      - type: regex
        name: aws-keys
        regex:
          - "AKIA[0-9A-Z]{16}"
          - "amzn\\.mws\\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}"

Extractor output

By default, non-internal extractors display their results in the output:
[server-info] [http] [info] https://example.com
  [server-info] nginx/1.18.0
  [x_powered_by] PHP/7.4.3
  [regex] nginx/1.18.0
  [dsl] Server: nginx/1.18.0 | Status: 200
Internal extractors are only available in template variables and don’t appear in output.

Best practices

  1. Use internal extractors for chaining - Mark extractors as internal when their values are only needed in subsequent requests
  2. Name extractors descriptively - Use clear names that indicate what data is being extracted
  3. Extract minimal data - Only extract the data you need to reduce memory usage
  4. Use capture groups - For regex extractors, use capture groups to extract specific parts
  5. Validate extracted data - Use matchers to verify extracted data meets expected format
  6. Combine extractor types - Use multiple extractor types for different data formats

Common patterns

Extract and validate

extractors:
  - type: regex
    name: version
    group: 1
    regex:
      - "Version:\\s*([0-9.]+)"

matchers:
  - type: regex
    regex:
      - "[0-9]+\\.[0-9]+\\.[0-9]+"  # Validate semver format

Extract multiple values

extractors:
  - type: regex
    name: endpoints
    regex:
      - "/api/v[0-9]+/[a-z]+"

Extract and transform

extractors:
  - type: dsl
    name: normalized-domain
    dsl:
      - "to_lower(trim_suffix(cname, '.'))"

Matchers

Pattern matching

Variables

Dynamic values

Helper Functions

DSL helper functions

Flow Control

Conditional execution

Build docs developers (and LLMs) love