Telemetry

Overview

ScrapeGraphAI includes an optional telemetry system that helps the development team understand usage patterns and improve the library. This page explains what data is collected, how to opt-out, and privacy considerations.

What Data is Collected

The telemetry system collects anonymous usage data when graphs are executed:

~/workspace/source/scrapegraphai/telemetry/telemetry.py

{
    "user_prompt": "Extract product information",
    "json_schema": "{...}",
    "website_content": "<extracted content>",
    "llm_response": "{...}",
    "llm_model": "gpt-4o",
    "url": "https://example.com"
}

Data Points

Collected Data Details

user_prompt: The prompt you provide to extract data
json_schema: Schema used for structured extraction (if provided)
website_content: The content extracted from the website
llm_response: The LLM’s generated response
llm_model: Name of the LLM model used
url: The source URL being scraped
anonymous_id: A random UUID (not linked to your identity)
version: ScrapeGraphAI library version

What is NOT Collected

Personal identifying information (PII)
API keys or credentials
IP addresses
System information
Local file paths
Error details or stack traces

Telemetry data is sent asynchronously in a background thread and does not impact scraping performance. Failed telemetry sends are silently ignored.

How to Opt-Out

There are three ways to disable telemetry:

Method 1: Environment Variable (Recommended)

Set the environment variable before running your script:

export SCRAPEGRAPHAI_TELEMETRY_ENABLED=false

Or in your Python code:

import os
os.environ["SCRAPEGRAPHAI_TELEMETRY_ENABLED"] = "false"

# Import after setting env var
from scrapegraphai.graphs import SmartScraperGraph

Method 2: Configuration File

Create or edit ~/.scrapegraphai.conf:

~/.scrapegraphai.conf

[DEFAULT]
telemetry_enabled = false
anonymous_id = your-existing-uuid

The anonymous_id is automatically generated on first use. You can keep it or remove it when disabling telemetry.

Method 3: Programmatic

Disable telemetry in your code:

from scrapegraphai.telemetry import disable_telemetry

# Call before creating any graphs
disable_telemetry()

from scrapegraphai.graphs import SmartScraperGraph

# Telemetry is now disabled
scraper = SmartScraperGraph(...)

Telemetry Implementation

Rate Limiting

Telemetry has built-in rate limiting:

MAX_COUNT_SESSION = 1000

def is_telemetry_enabled() -> bool:
    if g_telemetry_enabled:
        global CALL_COUNTER
        CALL_COUNTER += 1
        if CALL_COUNTER > MAX_COUNT_SESSION:
            return False  # Stop after 1000 calls per session
        return True
    return False

Conditional Collection

Telemetry only sends data when all required fields are present:

# Telemetry requires all of these
required_fields = [
    prompt,
    json_schema,
    content,
    llm_response,
    url
]

if not all(required_fields):
    # Telemetry is skipped
    return None

Error Handling

Telemetry never crashes your application:

def log_graph_execution(...):
    if not is_telemetry_enabled():
        return  # Silently skip

    if error_node is not None:
        return  # Don't send error cases

    payload = _build_telemetry_payload(...)
    if payload is None:
        logger.debug("Telemetry skipped: missing required fields")
        return

    _send_telemetry_threaded(payload)  # Non-blocking

Privacy Considerations

Data Transmission

Sent over HTTPS to https://sgai-oss-tracing.onrender.com/v1/telemetry
2-second timeout prevents hanging
Failures are logged but don’t affect execution

Data Storage

The anonymous ID is stored locally in ~/.scrapegraphai.conf:

DEFAULT_CONFIG_LOCATION = os.path.expanduser("~/.scrapegraphai.conf")

def _load_config(config_location: str):
    config = configparser.ConfigParser()
    # Generate unique ID if not exists
    if "anonymous_id" not in config["DEFAULT"]:
        config["DEFAULT"]["anonymous_id"] = str(uuid.uuid4())

This ID:

Is randomly generated
Is not linked to your identity
Helps group sessions for understanding usage patterns
Can be removed by deleting the config file

Sensitive Data Protection

If you’re scraping sensitive content, we strongly recommend disabling telemetry:

import os
os.environ["SCRAPEGRAPHAI_TELEMETRY_ENABLED"] = "false"

This prevents any scraped content from being transmitted.

Verifying Telemetry Status

Check if telemetry is enabled:

from scrapegraphai.telemetry import is_telemetry_enabled

if is_telemetry_enabled():
    print("Telemetry is ENABLED")
else:
    print("Telemetry is DISABLED")

View telemetry configuration:

import configparser
import os

config_path = os.path.expanduser("~/.scrapegraphai.conf")
config = configparser.ConfigParser()
config.read(config_path)

print(f"Telemetry enabled: {config.get('DEFAULT', 'telemetry_enabled', fallback='true')}")
print(f"Anonymous ID: {config.get('DEFAULT', 'anonymous_id', fallback='not set')}")

Docker and Production Environments

Docker

Disable telemetry in Dockerfile:

FROM python:3.11

ENV SCRAPEGRAPHAI_TELEMETRY_ENABLED=false

RUN pip install scrapegraphai

COPY . /app
WORKDIR /app

CMD ["python", "scraper.py"]

Docker Compose

docker-compose.yml

version: '3.8'
services:
  scraper:
    build: .
    environment:
      - SCRAPEGRAPHAI_TELEMETRY_ENABLED=false
      - OPENAI_API_KEY=${OPENAI_API_KEY}

Kubernetes

deployment.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: scraper-config
data:
  SCRAPEGRAPHAI_TELEMETRY_ENABLED: "false"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: scraper
spec:
  template:
    spec:
      containers:
      - name: scraper
        image: my-scraper:latest
        envFrom:
        - configMapRef:
            name: scraper-config

CI/CD

.github/workflows/scrape.yml

name: Run Scraper

on:
  schedule:
    - cron: '0 0 * * *'

jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: pip install scrapegraphai
      - name: Run scraper
        env:
          SCRAPEGRAPHAI_TELEMETRY_ENABLED: false
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: python scraper.py

Complete Opt-Out Example

Here’s a complete script with telemetry disabled:

scraper_no_telemetry.py

import os
from dotenv import load_dotenv

# IMPORTANT: Disable telemetry BEFORE importing scrapegraphai
os.environ["SCRAPEGRAPHAI_TELEMETRY_ENABLED"] = "false"

from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.telemetry import is_telemetry_enabled

load_dotenv()

# Verify telemetry is disabled
print(f"Telemetry enabled: {is_telemetry_enabled()}")  # Should print: False

graph_config = {
    "llm": {
        "model": "openai/gpt-4o",
        "api_key": os.getenv("OPENAI_API_KEY"),
    },
    "verbose": True,
}

scraper = SmartScraperGraph(
    prompt="Extract product information",
    source="https://example.com/products",
    config=graph_config,
)

result = scraper.run()
print(result)

# No telemetry data is sent

FAQ

Is telemetry enabled by default?

Yes, telemetry is enabled by default but can be easily disabled using any of the methods above.

Can telemetry identify me?

No. Only an anonymous random UUID is used, which is not linked to any personal information.

Does telemetry slow down scraping?

No. Telemetry is sent asynchronously in a background thread with a 2-second timeout, and failures are ignored.

Where is telemetry data sent?

Data is sent to https://sgai-oss-tracing.onrender.com/v1/telemetry over HTTPS.

Can I see what data is being sent?

Yes, you can inspect the payload by checking the source code at ~/workspace/source/scrapegraphai/telemetry/telemetry.py:80

Does disabling telemetry affect functionality?

No. Disabling telemetry has no impact on ScrapeGraphAI’s functionality. All features work identically.

Next Steps

Review troubleshooting tips for common issues
Build custom graphs with privacy in mind
Explore integrations for production deployments

Get Started

Core Concepts

Graphs

Configuration

Examples

Advanced

Overview

What Data is Collected

Data Points

What is NOT Collected

How to Opt-Out

Method 1: Environment Variable (Recommended)

Method 2: Configuration File

Method 3: Programmatic

Telemetry Implementation

Rate Limiting

Conditional Collection

Error Handling

Privacy Considerations

Data Transmission

Data Storage

Sensitive Data Protection

Verifying Telemetry Status

Docker and Production Environments

Docker

Docker Compose

Kubernetes

CI/CD

Complete Opt-Out Example

FAQ

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Graphs

Configuration

Examples

Advanced

​Overview

​What Data is Collected

​Data Points

​What is NOT Collected

​How to Opt-Out

​Method 1: Environment Variable (Recommended)

​Method 2: Configuration File

​Method 3: Programmatic

​Telemetry Implementation

​Rate Limiting

​Conditional Collection

​Error Handling

​Privacy Considerations

​Data Transmission

​Data Storage

​Sensitive Data Protection

​Verifying Telemetry Status

​Docker and Production Environments

​Docker

​Docker Compose

​Kubernetes

​CI/CD

​Complete Opt-Out Example

​FAQ

​Next Steps

Build docs developers (and LLMs) love

Overview

What Data is Collected

Data Points

What is NOT Collected

How to Opt-Out

Method 1: Environment Variable (Recommended)

Method 2: Configuration File

Method 3: Programmatic

Telemetry Implementation

Rate Limiting

Conditional Collection

Error Handling

Privacy Considerations

Data Transmission

Data Storage

Sensitive Data Protection

Verifying Telemetry Status

Docker and Production Environments

Docker

Docker Compose

Kubernetes

CI/CD

Complete Opt-Out Example

FAQ

Next Steps