Troubleshooting SafeNetworking

Connection Errors to Elasticsearch

Elasticsearch connectivity issues are common and can prevent SafeNetworking from processing events.

Symptoms

Application fails to start
ERROR messages about connection timeouts or refused connections
Events not being processed

Log indicators:

[ERROR] : Received a connection timeout error to elasticsearch: Connection timeout
[ERROR] : Received an error connecting to elasticsearch: Connection refused
[ERROR] : Transport Error working with abc123: Connection refused

Solutions

Verify Elasticsearch is Running

Check if Elasticsearch is running:

# Check Elasticsearch status
sudo systemctl status elasticsearch

# If not running, start it
sudo systemctl start elasticsearch

# Test connectivity
curl -X GET "localhost:9200"

Expected response:

{
  "name" : "node-1",
  "cluster_name" : "elasticsearch",
  "version" : { ... },
  "tagline" : "You Know, for Search"
}

Check Configuration

Verify Elasticsearch connection settings in .panrc:

ELASTICSEARCH_HOST = "localhost"
ELASTICSEARCH_PORT = "9200"

If Elasticsearch is on a different host or port, update these settings and restart SafeNetworking.

Check Network Connectivity

Test network connectivity to Elasticsearch:

# Test connection
telnet localhost 9200

# Or using netcat
nc -zv localhost 9200

If connection fails, check firewall rules or network configuration.

Review Elasticsearch Logs

Check Elasticsearch logs for errors:

# Systemd logs
sudo journalctl -u elasticsearch -n 100

# Or log files
sudo tail -f /var/log/elasticsearch/elasticsearch.log

Look for:

Out of memory errors
Disk space issues
Configuration errors

Verify Index Health

Check Elasticsearch cluster and index health:

# Cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Check indices
curl -X GET "localhost:9200/_cat/indices?v"

If indices are red, you may have shard allocation issues or data corruption.

Prevention

Configure Elasticsearch to start automatically: sudo systemctl enable elasticsearch
Monitor Elasticsearch health regularly
Ensure adequate disk space for Elasticsearch data
Configure SafeNetworking service to depend on Elasticsearch

AutoFocus API Issues

Problems with the AutoFocus API can prevent threat intelligence enrichment.

Rate Limit Exceeded (Daily)

Symptoms:

Processing slows down or stops
WARNING messages about point exhaustion
AF_POINTS_MODE activated

Log indicators:

[WARNING] : We have exceeded the daily allotment of points for AutoFocus - going into hibernation mode
[INFO] : Slowing down execution because daily point total is 4500
[INFO] : Sleeping for 3600 seconds because daily point total is 450

What SafeNetworking Does Automatically:

Low Points Warning

When daily points drop below AF_POINTS_LOW (default: 5000), processing switches to single-threaded mode to conserve points.Location: project/dns/dnsutils.py:90

Processing Halt

When daily points drop below AF_POINT_NOEXEC (default: 500), all processing stops.Location: project/dns/dnsutils.py:80

Automatic Recovery

The application sleeps for AF_NOEXEC_CKTIME (default: 3600 seconds / 1 hour) and checks points again.When points refresh (typically at midnight UTC), processing resumes automatically.

Manual Solutions:

# Edit .panrc to be more conservative
AF_POINTS_LOW = 10000  # Slow down earlier
AF_POINT_NOEXEC = 1000  # Stop with more cushion

Rate Limit Exceeded (Minute)

Symptoms:

Brief processing pauses
“Minute Bucket Exceeded” messages

Log indicators:

[WARNING] : We have exceeded the minute allotment of points for AutoFocus - going into hibernation mode

What SafeNetworking Does Automatically: The application automatically waits 60 seconds when minute limits are exceeded, then retries the query. Location: project/dns/dnsutils.py:98-100 Solution:

If minute limits are frequently exceeded, reduce DNS_POOL_COUNT. The combined total of DNS_POOL_COUNT and URL_POOL_COUNT should not exceed 16.

# Edit .panrc
DNS_POOL_COUNT = 12  # Reduce from 16

API Key Issues

Symptoms:

Application exits immediately on startup
CRITICAL error about API key

Log indicators:

[CRITICAL] : API Key for Autofocus is not set in .panrc, exiting

Solution:

Set API Key

Edit .panrc in the application base directory:

AUTOFOCUS_API_KEY = "your-api-key-here"

Verify API Key

Test your API key using curl:

curl -X POST "https://autofocus.paloaltonetworks.com/api/v1.0/tag/WildFireTest" \
  -H "Content-Type: application/json" \
  -d '{"apiKey": "your-api-key-here"}'

You should receive a JSON response with tag information.

Restart SafeNetworking

sfn start

Query Timeouts

Symptoms:

Queries to AutoFocus take too long
Processing is slow
Partial results returned

Log indicators:

[INFO] : Search completion 15% for example.com at 2 minute(s)
[INFO] : No samples found for example.com in time allotted

Explanation: AutoFocus queries can take 20+ minutes to complete across billions of samples. SafeNetworking uses a timeout system to balance thoroughness with performance. Location: project/dns/dnsutils.py:426-438 Configuration:

# Edit .panrc to adjust timeout behavior

# Minutes to wait for query results (default: 2)
AF_LOOKUP_TIMEOUT = 2

# Minimum completion percentage to accept (default: 20%)
AF_LOOKUP_MAX_PERCENTAGE = 20

How it works:

SafeNetworking submits a query and receives a “cookie”
Every minute, it checks query completion percentage
If AF_LOOKUP_TIMEOUT expires OR completion ≥ AF_LOOKUP_MAX_PERCENTAGE, results are accepted
Increasing timeout improves result quality but slows processing

Tuning Recommendations:

Scenario	`AF_LOOKUP_TIMEOUT`	`AF_LOOKUP_MAX_PERCENTAGE`
Fast processing, acceptable accuracy	1	15
Balanced (default)	2	20
High accuracy, slower processing	3	40
Maximum accuracy	5	60

Processing Halts

SafeNetworking stops processing events without obvious errors.

Symptoms

No new events being processed
No ERROR messages in logs
SafeNetworking process is running
Unprocessed events accumulating in Elasticsearch

Diagnostic Steps

Check Process Status

Verify SafeNetworking is running:

ps aux | grep sfn

# Or if running as a service
sudo systemctl status safenetworking

Review Recent Logs

Check for ERROR or WARNING messages:

tail -100 log/sfn.log | grep -E "ERROR|WARNING|CRITICAL"

Check AutoFocus Points

Query the af-details document:

curl -X GET "localhost:9200/sfn-details/_doc/af-details?pretty"

Check daily_points_remaining. If below 500, processing is paused automatically.

Check Processing Flags

Verify processing is enabled in .panrc:

DNS_PROCESSING = True  # Should be True
IOT_PROCESSING = False  # True if you want IoT processing

Check for Unprocessed Events

Query Elasticsearch for pending events:

curl -X GET "localhost:9200/threat-*/_count?pretty" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
      "bool": {
        "must": [
          { "match": { "tags": "DNS" }},
          { "match": { "SFN.processed": 0 }}
        ]
      }
    }
  }'

If count is 0, there are no events to process. If count > 0, processing should be active.

Solutions

Restart SafeNetworking

Sometimes a simple restart resolves the issue:

# If running directly
pkill -f "sfn start"
sfn start

# If running as a service
sudo systemctl restart safenetworking

Wait for AutoFocus Points Reset

If processing stopped due to point exhaustion (below 500), wait for the daily reset:

AutoFocus points reset at midnight UTC
Processing resumes automatically when points refresh
Check current time vs. reset time

To manually verify:

curl -X GET "localhost:9200/sfn-details/_doc/af-details?pretty" | grep daily_bucket_start

Check Thread Health

Background threads may have crashed. Check for thread-related errors:

grep -i "thread" log/sfn.log | tail -20
grep -i "exception" log/sfn.log | tail -20

If threads crashed, restart SafeNetworking.

Verify Elasticsearch Connectivity

Processing can stall if Elasticsearch connectivity is intermittent:

# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Check index status
curl -X GET "localhost:9200/_cat/indices/threat-*?v"

Resolve any Elasticsearch issues found.

Recovery Procedure

Stop SafeNetworking

sudo systemctl stop safenetworking
# Or: pkill -f "sfn start"

Verify Prerequisites

# Elasticsearch running
sudo systemctl status elasticsearch

# AutoFocus API key set
grep AUTOFOCUS_API_KEY .panrc

# Adequate disk space
df -h

Clear Any Stale Locks

If applicable, clear any application-level locks or state:

# This depends on your deployment
rm -f /tmp/sfn.lock

Start SafeNetworking

sudo systemctl start safenetworking
# Or: sfn start

Monitor Startup

Watch logs for successful initialization:

tail -f log/sfn.log

Look for:

[INFO] : SafeNetworking application initializing
[INFO] : ElasticSearch host is: localhost:9200
[INFO] : Background processes initialized
[INFO] : SafeNetworking server started @ localhost:5000

Verify Processing Resumes

Check that events are being processed:

# Should see processing activity
tail -f log/sfn.log | grep "Processing"

Debug Mode Configuration

Enable debug mode for detailed troubleshooting information.

Enable Debug Mode

Edit .panrc:

# Enable debug logging
LOG_LEVEL = "DEBUG"

# Process one event at a time for detailed tracking
DEBUG_MODE = True

# Enable Flask debug mode (shows debug on console)
DEBUG = True

DEBUG_MODE = True significantly slows processing by handling events sequentially. Use only for troubleshooting.Location: project/dns/runner.py:88

What Debug Mode Does

Normal Mode
Debug Mode

Processes multiple events in parallel using thread pools
Pool size determined by DNS_POOL_COUNT
Logs are concise (INFO level)
High throughput

multiProcNum = app.config['DNS_POOL_COUNT']  # e.g., 16
with Pool(multiProcNum) as pool:
    results = pool.map(searchDomain, priDocIds)

Processes one event at a time sequentially
Detailed logs for every step (DEBUG level)
Easy to trace individual event processing
Low throughput

for event in priDocIds:
    results = searchDomain(event)
    app.logger.debug(f"Results: {results}")

Debug Mode Output

With debug mode enabled, you’ll see detailed logs like:

[DEBUG] : Gathering 1000 THREAT events from ElasticSearch
[DEBUG] : Calling getDomainDoc() for example.com
[DEBUG] : Querying local cache for example.com
[DEBUG] : Domain last updated can't be older than 2026-02-02T10:15:32
[DEBUG] : Gathering domain info for example.com (10 API-points)
[DEBUG] : Initial AF domain query returned {cookie: abc123}
[DEBUG] : Cookie abc123 returned for query of example.com
[DEBUG] : Checking cookie abc123 (2 API-points)
[DEBUG] : Search completion 20% for example.com at 1 minute(s)
[DEBUG] : Calling processTagList({hit data})
[DEBUG] : Found tag(s) ['ELFMirai', 'Coinminer'] in sample
[DEBUG] : Processing tag ELFMirai
[DEBUG] : Calling assessTags({domain tag data})
[DEBUG] : Working on tag ELFMirai with class of malware_family
[DEBUG] : Calculating confidence level: Day differential of 45
[DEBUG] : confidence_level for ELFMirai @ date 2026-01-18T10:00:00: 70 based on age of 45 days
[DEBUG] : Saved event doc with the following data: {event data}
[DEBUG] : abc123 save: SUCCESS

Disable Debug Mode

After troubleshooting, disable debug mode for normal performance:

# Edit .panrc
LOG_LEVEL = "INFO"
DEBUG_MODE = False
DEBUG = False

Restart SafeNetworking for changes to take effect.

Check System Health

Comprehensive system health check procedure.

Quick Health Check Script

#!/bin/bash

echo "=== SafeNetworking Health Check ==="
echo

# Check if SafeNetworking is running
echo "[1] SafeNetworking Process:"
if pgrep -f "sfn start" > /dev/null; then
    echo "✓ Running"
    ps aux | grep "sfn start" | grep -v grep
else
    echo "✗ Not running"
fi
echo

# Check Elasticsearch
echo "[2] Elasticsearch:"
if curl -s "localhost:9200" > /dev/null 2>&1; then
    echo "✓ Accessible"
    curl -s "localhost:9200/_cluster/health?pretty" | grep -E '"cluster_name"|"status"'
else
    echo "✗ Not accessible"
fi
echo

# Check AutoFocus points
echo "[3] AutoFocus Points:"
if curl -s "localhost:9200/sfn-details/_doc/af-details" > /dev/null 2>&1; then
    POINTS=$(curl -s "localhost:9200/sfn-details/_doc/af-details" | 
             python3 -c "import sys, json; print(json.load(sys.stdin)['_source']['daily_points_remaining'])")
    echo "✓ Daily points remaining: $POINTS"
    if [ "$POINTS" -lt 500 ]; then
        echo "⚠ WARNING: Points below critical threshold (500)"
    elif [ "$POINTS" -lt 5000 ]; then
        echo "⚠ WARNING: Points below low threshold (5000)"
    fi
else
    echo "✗ Cannot retrieve point information"
fi
echo

# Check unprocessed events
echo "[4] Unprocessed Events:"
if curl -s "localhost:9200/threat-*/_count" > /dev/null 2>&1; then
    UNPROCESSED=$(curl -s "localhost:9200/threat-*/_count" -H 'Content-Type: application/json' -d '{
      "query": {
        "bool": {
          "must": [
            {"match": {"tags": "DNS"}},
            {"match": {"SFN.processed": 0}}
          ]
        }
      }
    }' | python3 -c "import sys, json; print(json.load(sys.stdin)['count'])")
    echo "Unprocessed DNS events: $UNPROCESSED"
else
    echo "✗ Cannot query events"
fi
echo

# Check recent errors in logs
echo "[5] Recent Errors (last 10):"
if [ -f "log/sfn.log" ]; then
    ERROR_COUNT=$(grep -E "ERROR|CRITICAL" log/sfn.log | wc -l)
    echo "Total errors in log: $ERROR_COUNT"
    grep -E "ERROR|CRITICAL" log/sfn.log | tail -10
else
    echo "✗ Log file not found"
fi
echo

# Check disk space
echo "[6] Disk Space:"
df -h | grep -E 'Filesystem|/$'
echo

echo "=== Health Check Complete ==="

Save as health_check.sh, make executable, and run:

chmod +x health_check.sh
./health_check.sh

Component Status Checklist

System Requirements

Operating System:

Linux distribution (Ubuntu, CentOS, RHEL)
Adequate CPU (multi-core recommended)
Sufficient RAM (4GB+ recommended)
Disk space for logs and Elasticsearch data

Check:

uname -a
free -h
df -h

SafeNetworking Application

Process running
Flask server responding on configured port
Background threads active (DNS, IoT, AF points)
No CRITICAL or ERROR messages in recent logs
Configuration file (.panrc) present and valid

Check:

ps aux | grep sfn
curl http://localhost:5000/
tail -50 log/sfn.log

Elasticsearch

Check:

curl localhost:9200/_cluster/health?pretty
curl localhost:9200/_cat/indices?v
curl localhost:9200/_cat/shards?v | grep -E "UNASSIGNED|FAILED"

AutoFocus API

API key configured in .panrc
Daily points remaining > 500
Minute points not consistently exceeded
Queries returning results
Reasonable response times

Check:

curl -X POST "https://autofocus.paloaltonetworks.com/api/v1.0/tag/WildFireTest" \
  -H "Content-Type: application/json" \
  -d '{"apiKey": "your-key-here"}'

curl localhost:9200/sfn-details/_doc/af-details?pretty

Event Processing

Events being retrieved from Elasticsearch
Domain lookups succeeding
Tags being assessed and applied
Events marked as processed (SFN.processed = 1)
Reasonable processing rate

Check:

grep "Processing" log/sfn.log | tail -10
grep "save: SUCCESS" log/sfn.log | tail -10
grep "save: FAIL" log/sfn.log | tail -10

Common Error Messages

Quick Reference

Error Message	Severity	Likely Cause	Solution
`API Key for Autofocus is not set`	CRITICAL	Missing API key	Add `AUTOFOCUS_API_KEY` to `.panrc`
`Connection refused`	ERROR	Elasticsearch down	Start Elasticsearch service
`Connection timeout`	ERROR	Network/ES slow	Check ES health and network
`Daily Bucket Exceeded`	WARNING	AF points exhausted	Wait for reset or adjust thresholds
`Minute Bucket Exceeded`	WARNING	Too many parallel requests	Reduce `DNS_POOL_COUNT`
`Unable to work with event doc`	ERROR	ES or processing issue	Check ES connectivity and event structure
`Transport Error`	ERROR	ES communication failure	Restart ES or check network
`No local cache found`	INFO	Normal, creating cache	No action needed
`No samples found for domain`	INFO	AF query returned no results	Normal for some domains
`Slowing down execution`	INFO	Low AF points	Normal protection, wait for reset

Getting Additional Help

Enable Debug Logging

Set LOG_LEVEL = "DEBUG" in .panrc and review detailed logs for more context about errors.

Check Documentation

Review the monitoring guide for normal operational indicators and metrics to compare against.

Elasticsearch Documentation

Consult Elasticsearch documentation for cluster and index management issues.

AutoFocus Support

Contact Palo Alto Networks support for AutoFocus API issues or rate limit increases.

Get Started

Core Concepts

Event Processing

Configuration

Operations

Troubleshooting SafeNetworking

Connection Errors to Elasticsearch

Symptoms

Solutions

Prevention

AutoFocus API Issues

Rate Limit Exceeded (Daily)

Rate Limit Exceeded (Minute)

API Key Issues

Query Timeouts

Processing Halts

Symptoms

Diagnostic Steps

Solutions

Recovery Procedure

Debug Mode Configuration

Enable Debug Mode

What Debug Mode Does

Debug Mode Output

Disable Debug Mode

Check System Health

Quick Health Check Script

Component Status Checklist

Common Error Messages

Quick Reference

Getting Additional Help

Enable Debug Logging

Check Documentation

Elasticsearch Documentation

AutoFocus Support

Next Steps

Monitoring

Running SafeNetworking

Build docs developers (and LLMs) love

Get Started

Core Concepts

Event Processing

Configuration

Operations

​Connection Errors to Elasticsearch

​Symptoms

​Solutions

​Prevention

​AutoFocus API Issues

​Rate Limit Exceeded (Daily)

​Rate Limit Exceeded (Minute)

​API Key Issues

​Query Timeouts

​Processing Halts

​Symptoms

​Diagnostic Steps

​Solutions

​Recovery Procedure

​Debug Mode Configuration

​Enable Debug Mode

​What Debug Mode Does

​Debug Mode Output

​Disable Debug Mode

​Check System Health

​Quick Health Check Script

​Component Status Checklist

​Common Error Messages

​Quick Reference

​Getting Additional Help

Enable Debug Logging

Check Documentation

Elasticsearch Documentation

AutoFocus Support

​Next Steps

Monitoring

Running SafeNetworking

Build docs developers (and LLMs) love

Connection Errors to Elasticsearch

Symptoms

Solutions

Prevention

AutoFocus API Issues

Rate Limit Exceeded (Daily)

Rate Limit Exceeded (Minute)

API Key Issues

Query Timeouts

Processing Halts

Symptoms

Diagnostic Steps

Solutions

Recovery Procedure

Debug Mode Configuration

Enable Debug Mode

What Debug Mode Does

Debug Mode Output

Disable Debug Mode

Check System Health

Quick Health Check Script

Component Status Checklist

Common Error Messages

Quick Reference

Getting Additional Help

Next Steps