Extract

Overview

The Extract node (Data Extractor) extracts structured data from multiple similar elements on a web page. It’s ideal for scraping lists, tables, product cards, search results, and other repeating patterns. Results can be stored in context and optionally saved to CSV.

Configuration

containerSelector

string

required

Selector for the container elements. Each matching element will be processed.Example: .product-card, tr.data-row, div.search-resultSupports variable interpolation: ${data.containerClass}

containerSelectorType

string

default:"css"

Type of selector for containers: css, xpath, text

fields

array

required

Array of field definitions specifying what data to extract from each container.Field Definition:

name: Field name (becomes object key)
selector: Element selector within container
selectorType: css, xpath, or text
extract: What to extract: text, attribute, or innerHTML
attribute: Attribute name (required if extract is attribute)

outputVariable

string

default:"extractedData"

Context variable name to store the extracted data array.

limit

number

default:"0"

Maximum number of containers to process. 0 means process all.

waitForSelector

boolean

default:"true"

Wait for the first container element to be visible before extraction.

timeout

number

default:"30000"

Maximum time in milliseconds to wait for elements.

failSilently

boolean

default:"false"

If true, missing fields in containers will be set to null instead of throwing errors.

CSV Export

saveToCSV

boolean

default:"false"

Enable CSV file export of extracted data.

csvFilePath

string

Path to save the CSV file. Required if saveToCSV is enabled.Supports variable interpolation: exports/data-${data.timestamp}.csv

csvDelimiter

string

default:","

CSV column delimiter: ,, ;, \t, or custom.

Examples

Extract Product List

{
  "type": "dataExtractor",
  "data": {
    "containerSelector": ".product-card",
    "containerSelectorType": "css",
    "fields": [
      {
        "name": "title",
        "selector": ".product-title",
        "selectorType": "css",
        "extract": "text"
      },
      {
        "name": "price",
        "selector": ".product-price",
        "selectorType": "css",
        "extract": "text"
      },
      {
        "name": "url",
        "selector": "a.product-link",
        "selectorType": "css",
        "extract": "attribute",
        "attribute": "href"
      },
      {
        "name": "image",
        "selector": "img.product-image",
        "selectorType": "css",
        "extract": "attribute",
        "attribute": "src"
      }
    ],
    "outputVariable": "products",
    "limit": 20
  }
}

Extract Table Data

{
  "type": "dataExtractor",
  "data": {
    "containerSelector": "//table[@id='data-table']/tbody/tr",
    "containerSelectorType": "xpath",
    "fields": [
      {
        "name": "id",
        "selector": "td[1]",
        "selectorType": "css",
        "extract": "text"
      },
      {
        "name": "name",
        "selector": "td[2]",
        "selectorType": "css",
        "extract": "text"
      },
      {
        "name": "status",
        "selector": "td[3]",
        "selectorType": "css",
        "extract": "text"
      },
      {
        "name": "link",
        "selector": ".//a",
        "selectorType": "xpath",
        "extract": "attribute",
        "attribute": "href"
      }
    ],
    "outputVariable": "tableData"
  }
}

Extract Search Results

Search Results

{
  "type": "dataExtractor",
  "data": {
    "containerSelector": ".search-result",
    "fields": [
      {
        "name": "title",
        "selector": "h3.result-title",
        "extract": "text"
      },
      {
        "name": "snippet",
        "selector": ".result-snippet",
        "extract": "text"
      },
      {
        "name": "url",
        "selector": "a.result-link",
        "extract": "attribute",
        "attribute": "href"
      },
      {
        "name": "rating",
        "selector": ".rating",
        "extract": "attribute",
        "attribute": "data-score"
      }
    ],
    "outputVariable": "searchResults",
    "limit": 10,
    "failSilently": true
  }
}

Extract and Save to CSV

With CSV Export

{
  "type": "dataExtractor",
  "data": {
    "containerSelector": ".employee-card",
    "fields": [
      {
        "name": "name",
        "selector": ".emp-name",
        "extract": "text"
      },
      {
        "name": "department",
        "selector": ".emp-dept",
        "extract": "text"
      },
      {
        "name": "email",
        "selector": ".emp-email",
        "extract": "text"
      },
      {
        "name": "phone",
        "selector": ".emp-phone",
        "extract": "text"
      }
    ],
    "outputVariable": "employees",
    "saveToCSV": true,
    "csvFilePath": "exports/employees.csv",
    "csvDelimiter": ","
  }
}

Extract with Limit

First 5 Items

{
  "type": "dataExtractor",
  "data": {
    "containerSelector": ".blog-post",
    "fields": [
      {
        "name": "title",
        "selector": "h2.post-title",
        "extract": "text"
      },
      {
        "name": "author",
        "selector": ".post-author",
        "extract": "text"
      },
      {
        "name": "date",
        "selector": "time",
        "extract": "attribute",
        "attribute": "datetime"
      }
    ],
    "outputVariable": "recentPosts",
    "limit": 5
  }
}

Accessing Extracted Data

const products = context.getData('products');

console.log(`Extracted ${products.length} products`);

// Process each product
products.forEach((product, index) => {
  console.log(`${index + 1}. ${product.title} - ${product.price}`);
});

// Filter data
const expensive = products.filter(p => {
  const price = parseFloat(p.price.replace(/[^0-9.]/g, ''));
  return price > 50;
});

context.setData('expensiveProducts', expensive);

Field Extract Types

Type	Description	Use Case
`text`	Extract text content	Visible text, labels, descriptions
`attribute`	Extract attribute value	URLs (href), images (src), IDs (data-id)
`innerHTML`	Extract inner HTML	Rich content, formatted text

Notes

The extractor processes each container element in sequence. For large datasets, consider using the limit parameter to control processing time.

If a field selector doesn’t match any element within a container, the field value will be null (if failSilently is true) or the extraction will fail (if failSilently is false).

CSV export automatically creates parent directories if they don’t exist. Relative paths are resolved from the project root.

Best Practices

{
  "containerSelector": ".item",
  "fields": [...],
  "failSilently": true,  // Don't fail on missing fields
  "waitForSelector": true,  // Wait for content
  "timeout": 15000
}

Common Patterns

Pagination Loop

[
  {
    "type": "dataExtractor",
    "data": {
      "containerSelector": ".item",
      "fields": [...],
      "outputVariable": "pageData"
    }
  },
  {
    "type": "action",
    "data": {
      "action": "click",
      "selector": ".next-page"
    }
  }
]

Conditional Extraction

const count = context.getData('itemCount');
if (count > 0) {
  // Proceed with extraction
  context.setData('shouldExtract', true);
}

Get Text - Extract from single elements
Loop - Process extracted data
JavaScript Code - Transform data
API Request - Send extracted data

Browser Nodes

Interaction Nodes

Data Nodes

Verification Nodes

API Nodes

Control Nodes

Utility Nodes

Overview

Configuration

CSV Export

Examples

Extract Product List

Extract Table Data

Extract Search Results

Extract and Save to CSV

Extract with Limit

Accessing Extracted Data

Field Extract Types

Notes

Best Practices

Common Patterns

Conditional Extraction

Build docs developers (and LLMs) love

Browser Nodes

Interaction Nodes

Data Nodes

Verification Nodes

API Nodes

Control Nodes

Utility Nodes

​Overview

​Configuration

​CSV Export

​Examples

​Extract Product List

​Extract Table Data

​Extract Search Results

​Extract and Save to CSV

​Extract with Limit

​Accessing Extracted Data

​Field Extract Types

​Notes

​Best Practices

​Common Patterns

​Pagination Loop

​Conditional Extraction

​Related Nodes

Build docs developers (and LLMs) love

Overview

Configuration

CSV Export

Examples

Extract Product List

Extract Table Data

Extract Search Results

Extract and Save to CSV

Extract with Limit

Accessing Extracted Data

Field Extract Types

Notes

Best Practices

Common Patterns

Pagination Loop

Conditional Extraction

Related Nodes