Skip to main content

Overview

The Extract node (Data Extractor) extracts structured data from multiple similar elements on a web page. It’s ideal for scraping lists, tables, product cards, search results, and other repeating patterns. Results can be stored in context and optionally saved to CSV.

Configuration

containerSelector
string
required
Selector for the container elements. Each matching element will be processed.Example: .product-card, tr.data-row, div.search-resultSupports variable interpolation: ${data.containerClass}
containerSelectorType
string
default:"css"
Type of selector for containers: css, xpath, text
fields
array
required
Array of field definitions specifying what data to extract from each container.Field Definition:
  • name: Field name (becomes object key)
  • selector: Element selector within container
  • selectorType: css, xpath, or text
  • extract: What to extract: text, attribute, or innerHTML
  • attribute: Attribute name (required if extract is attribute)
outputVariable
string
default:"extractedData"
Context variable name to store the extracted data array.
limit
number
default:"0"
Maximum number of containers to process. 0 means process all.
waitForSelector
boolean
default:"true"
Wait for the first container element to be visible before extraction.
timeout
number
default:"30000"
Maximum time in milliseconds to wait for elements.
failSilently
boolean
default:"false"
If true, missing fields in containers will be set to null instead of throwing errors.

CSV Export

saveToCSV
boolean
default:"false"
Enable CSV file export of extracted data.
csvFilePath
string
Path to save the CSV file. Required if saveToCSV is enabled.Supports variable interpolation: exports/data-${data.timestamp}.csv
csvDelimiter
string
default:","
CSV column delimiter: ,, ;, \t, or custom.

Examples

Extract Product List

{
  "type": "dataExtractor",
  "data": {
    "containerSelector": ".product-card",
    "containerSelectorType": "css",
    "fields": [
      {
        "name": "title",
        "selector": ".product-title",
        "selectorType": "css",
        "extract": "text"
      },
      {
        "name": "price",
        "selector": ".product-price",
        "selectorType": "css",
        "extract": "text"
      },
      {
        "name": "url",
        "selector": "a.product-link",
        "selectorType": "css",
        "extract": "attribute",
        "attribute": "href"
      },
      {
        "name": "image",
        "selector": "img.product-image",
        "selectorType": "css",
        "extract": "attribute",
        "attribute": "src"
      }
    ],
    "outputVariable": "products",
    "limit": 20
  }
}

Extract Table Data

{
  "type": "dataExtractor",
  "data": {
    "containerSelector": "//table[@id='data-table']/tbody/tr",
    "containerSelectorType": "xpath",
    "fields": [
      {
        "name": "id",
        "selector": "td[1]",
        "selectorType": "css",
        "extract": "text"
      },
      {
        "name": "name",
        "selector": "td[2]",
        "selectorType": "css",
        "extract": "text"
      },
      {
        "name": "status",
        "selector": "td[3]",
        "selectorType": "css",
        "extract": "text"
      },
      {
        "name": "link",
        "selector": ".//a",
        "selectorType": "xpath",
        "extract": "attribute",
        "attribute": "href"
      }
    ],
    "outputVariable": "tableData"
  }
}

Extract Search Results

Search Results
{
  "type": "dataExtractor",
  "data": {
    "containerSelector": ".search-result",
    "fields": [
      {
        "name": "title",
        "selector": "h3.result-title",
        "extract": "text"
      },
      {
        "name": "snippet",
        "selector": ".result-snippet",
        "extract": "text"
      },
      {
        "name": "url",
        "selector": "a.result-link",
        "extract": "attribute",
        "attribute": "href"
      },
      {
        "name": "rating",
        "selector": ".rating",
        "extract": "attribute",
        "attribute": "data-score"
      }
    ],
    "outputVariable": "searchResults",
    "limit": 10,
    "failSilently": true
  }
}

Extract and Save to CSV

With CSV Export
{
  "type": "dataExtractor",
  "data": {
    "containerSelector": ".employee-card",
    "fields": [
      {
        "name": "name",
        "selector": ".emp-name",
        "extract": "text"
      },
      {
        "name": "department",
        "selector": ".emp-dept",
        "extract": "text"
      },
      {
        "name": "email",
        "selector": ".emp-email",
        "extract": "text"
      },
      {
        "name": "phone",
        "selector": ".emp-phone",
        "extract": "text"
      }
    ],
    "outputVariable": "employees",
    "saveToCSV": true,
    "csvFilePath": "exports/employees.csv",
    "csvDelimiter": ","
  }
}

Extract with Limit

First 5 Items
{
  "type": "dataExtractor",
  "data": {
    "containerSelector": ".blog-post",
    "fields": [
      {
        "name": "title",
        "selector": "h2.post-title",
        "extract": "text"
      },
      {
        "name": "author",
        "selector": ".post-author",
        "extract": "text"
      },
      {
        "name": "date",
        "selector": "time",
        "extract": "attribute",
        "attribute": "datetime"
      }
    ],
    "outputVariable": "recentPosts",
    "limit": 5
  }
}

Accessing Extracted Data

const products = context.getData('products');

console.log(`Extracted ${products.length} products`);

// Process each product
products.forEach((product, index) => {
  console.log(`${index + 1}. ${product.title} - ${product.price}`);
});

// Filter data
const expensive = products.filter(p => {
  const price = parseFloat(p.price.replace(/[^0-9.]/g, ''));
  return price > 50;
});

context.setData('expensiveProducts', expensive);

Field Extract Types

TypeDescriptionUse Case
textExtract text contentVisible text, labels, descriptions
attributeExtract attribute valueURLs (href), images (src), IDs (data-id)
innerHTMLExtract inner HTMLRich content, formatted text

Notes

The extractor processes each container element in sequence. For large datasets, consider using the limit parameter to control processing time.
If a field selector doesn’t match any element within a container, the field value will be null (if failSilently is true) or the extraction will fail (if failSilently is false).
CSV export automatically creates parent directories if they don’t exist. Relative paths are resolved from the project root.

Best Practices

{
  "containerSelector": ".item",
  "fields": [...],
  "failSilently": true,  // Don't fail on missing fields
  "waitForSelector": true,  // Wait for content
  "timeout": 15000
}

Common Patterns

Pagination Loop

[
  {
    "type": "dataExtractor",
    "data": {
      "containerSelector": ".item",
      "fields": [...],
      "outputVariable": "pageData"
    }
  },
  {
    "type": "action",
    "data": {
      "action": "click",
      "selector": ".next-page"
    }
  }
]

Conditional Extraction

const count = context.getData('itemCount');
if (count > 0) {
  // Proceed with extraction
  context.setData('shouldExtract', true);
}

Build docs developers (and LLMs) love