Skip to main content

Connection Errors to Elasticsearch

Elasticsearch connectivity issues are common and can prevent SafeNetworking from processing events.

Symptoms

  • Application fails to start
  • ERROR messages about connection timeouts or refused connections
  • Events not being processed
Log indicators:
[ERROR] : Received a connection timeout error to elasticsearch: Connection timeout
[ERROR] : Received an error connecting to elasticsearch: Connection refused
[ERROR] : Transport Error working with abc123: Connection refused

Solutions

Check if Elasticsearch is running:
# Check Elasticsearch status
sudo systemctl status elasticsearch

# If not running, start it
sudo systemctl start elasticsearch

# Test connectivity
curl -X GET "localhost:9200"
Expected response:
{
  "name" : "node-1",
  "cluster_name" : "elasticsearch",
  "version" : { ... },
  "tagline" : "You Know, for Search"
}
Verify Elasticsearch connection settings in .panrc:
ELASTICSEARCH_HOST = "localhost"
ELASTICSEARCH_PORT = "9200"
If Elasticsearch is on a different host or port, update these settings and restart SafeNetworking.
Test network connectivity to Elasticsearch:
# Test connection
telnet localhost 9200

# Or using netcat
nc -zv localhost 9200
If connection fails, check firewall rules or network configuration.
Check Elasticsearch logs for errors:
# Systemd logs
sudo journalctl -u elasticsearch -n 100

# Or log files
sudo tail -f /var/log/elasticsearch/elasticsearch.log
Look for:
  • Out of memory errors
  • Disk space issues
  • Configuration errors
Check Elasticsearch cluster and index health:
# Cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Check indices
curl -X GET "localhost:9200/_cat/indices?v"
If indices are red, you may have shard allocation issues or data corruption.

Prevention

  • Configure Elasticsearch to start automatically: sudo systemctl enable elasticsearch
  • Monitor Elasticsearch health regularly
  • Ensure adequate disk space for Elasticsearch data
  • Configure SafeNetworking service to depend on Elasticsearch

AutoFocus API Issues

Problems with the AutoFocus API can prevent threat intelligence enrichment.

Rate Limit Exceeded (Daily)

Symptoms:
  • Processing slows down or stops
  • WARNING messages about point exhaustion
  • AF_POINTS_MODE activated
Log indicators:
[WARNING] : We have exceeded the daily allotment of points for AutoFocus - going into hibernation mode
[INFO] : Slowing down execution because daily point total is 4500
[INFO] : Sleeping for 3600 seconds because daily point total is 450
What SafeNetworking Does Automatically:
1

Low Points Warning

When daily points drop below AF_POINTS_LOW (default: 5000), processing switches to single-threaded mode to conserve points.Location: project/dns/dnsutils.py:90
2

Processing Halt

When daily points drop below AF_POINT_NOEXEC (default: 500), all processing stops.Location: project/dns/dnsutils.py:80
3

Automatic Recovery

The application sleeps for AF_NOEXEC_CKTIME (default: 3600 seconds / 1 hour) and checks points again.When points refresh (typically at midnight UTC), processing resumes automatically.
Manual Solutions:
# Edit .panrc to be more conservative
AF_POINTS_LOW = 10000  # Slow down earlier
AF_POINT_NOEXEC = 1000  # Stop with more cushion

Rate Limit Exceeded (Minute)

Symptoms:
  • Brief processing pauses
  • “Minute Bucket Exceeded” messages
Log indicators:
[WARNING] : We have exceeded the minute allotment of points for AutoFocus - going into hibernation mode
What SafeNetworking Does Automatically: The application automatically waits 60 seconds when minute limits are exceeded, then retries the query. Location: project/dns/dnsutils.py:98-100 Solution:
If minute limits are frequently exceeded, reduce DNS_POOL_COUNT. The combined total of DNS_POOL_COUNT and URL_POOL_COUNT should not exceed 16.
# Edit .panrc
DNS_POOL_COUNT = 12  # Reduce from 16

API Key Issues

Symptoms:
  • Application exits immediately on startup
  • CRITICAL error about API key
Log indicators:
[CRITICAL] : API Key for Autofocus is not set in .panrc, exiting
Solution:
1

Set API Key

Edit .panrc in the application base directory:
AUTOFOCUS_API_KEY = "your-api-key-here"
2

Verify API Key

Test your API key using curl:
curl -X POST "https://autofocus.paloaltonetworks.com/api/v1.0/tag/WildFireTest" \
  -H "Content-Type: application/json" \
  -d '{"apiKey": "your-api-key-here"}'
You should receive a JSON response with tag information.
3

Restart SafeNetworking

sfn start

Query Timeouts

Symptoms:
  • Queries to AutoFocus take too long
  • Processing is slow
  • Partial results returned
Log indicators:
[INFO] : Search completion 15% for example.com at 2 minute(s)
[INFO] : No samples found for example.com in time allotted
Explanation: AutoFocus queries can take 20+ minutes to complete across billions of samples. SafeNetworking uses a timeout system to balance thoroughness with performance. Location: project/dns/dnsutils.py:426-438 Configuration:
# Edit .panrc to adjust timeout behavior

# Minutes to wait for query results (default: 2)
AF_LOOKUP_TIMEOUT = 2

# Minimum completion percentage to accept (default: 20%)
AF_LOOKUP_MAX_PERCENTAGE = 20
How it works:
  • SafeNetworking submits a query and receives a “cookie”
  • Every minute, it checks query completion percentage
  • If AF_LOOKUP_TIMEOUT expires OR completion ≥ AF_LOOKUP_MAX_PERCENTAGE, results are accepted
  • Increasing timeout improves result quality but slows processing
Tuning Recommendations:
ScenarioAF_LOOKUP_TIMEOUTAF_LOOKUP_MAX_PERCENTAGE
Fast processing, acceptable accuracy115
Balanced (default)220
High accuracy, slower processing340
Maximum accuracy560

Processing Halts

SafeNetworking stops processing events without obvious errors.

Symptoms

  • No new events being processed
  • No ERROR messages in logs
  • SafeNetworking process is running
  • Unprocessed events accumulating in Elasticsearch

Diagnostic Steps

1

Check Process Status

Verify SafeNetworking is running:
ps aux | grep sfn

# Or if running as a service
sudo systemctl status safenetworking
2

Review Recent Logs

Check for ERROR or WARNING messages:
tail -100 log/sfn.log | grep -E "ERROR|WARNING|CRITICAL"
3

Check AutoFocus Points

Query the af-details document:
curl -X GET "localhost:9200/sfn-details/_doc/af-details?pretty"
Check daily_points_remaining. If below 500, processing is paused automatically.
4

Check Processing Flags

Verify processing is enabled in .panrc:
DNS_PROCESSING = True  # Should be True
IOT_PROCESSING = False  # True if you want IoT processing
5

Check for Unprocessed Events

Query Elasticsearch for pending events:
curl -X GET "localhost:9200/threat-*/_count?pretty" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": {
      "bool": {
        "must": [
          { "match": { "tags": "DNS" }},
          { "match": { "SFN.processed": 0 }}
        ]
      }
    }
  }'
If count is 0, there are no events to process. If count > 0, processing should be active.

Solutions

Sometimes a simple restart resolves the issue:
# If running directly
pkill -f "sfn start"
sfn start

# If running as a service
sudo systemctl restart safenetworking
If processing stopped due to point exhaustion (below 500), wait for the daily reset:
  • AutoFocus points reset at midnight UTC
  • Processing resumes automatically when points refresh
  • Check current time vs. reset time
To manually verify:
curl -X GET "localhost:9200/sfn-details/_doc/af-details?pretty" | grep daily_bucket_start
Background threads may have crashed. Check for thread-related errors:
grep -i "thread" log/sfn.log | tail -20
grep -i "exception" log/sfn.log | tail -20
If threads crashed, restart SafeNetworking.
Processing can stall if Elasticsearch connectivity is intermittent:
# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Check index status
curl -X GET "localhost:9200/_cat/indices/threat-*?v"
Resolve any Elasticsearch issues found.

Recovery Procedure

1

Stop SafeNetworking

sudo systemctl stop safenetworking
# Or: pkill -f "sfn start"
2

Verify Prerequisites

# Elasticsearch running
sudo systemctl status elasticsearch

# AutoFocus API key set
grep AUTOFOCUS_API_KEY .panrc

# Adequate disk space
df -h
3

Clear Any Stale Locks

If applicable, clear any application-level locks or state:
# This depends on your deployment
rm -f /tmp/sfn.lock
4

Start SafeNetworking

sudo systemctl start safenetworking
# Or: sfn start
5

Monitor Startup

Watch logs for successful initialization:
tail -f log/sfn.log
Look for:
[INFO] : SafeNetworking application initializing
[INFO] : ElasticSearch host is: localhost:9200
[INFO] : Background processes initialized
[INFO] : SafeNetworking server started @ localhost:5000
6

Verify Processing Resumes

Check that events are being processed:
# Should see processing activity
tail -f log/sfn.log | grep "Processing"

Debug Mode Configuration

Enable debug mode for detailed troubleshooting information.

Enable Debug Mode

Edit .panrc:
# Enable debug logging
LOG_LEVEL = "DEBUG"

# Process one event at a time for detailed tracking
DEBUG_MODE = True

# Enable Flask debug mode (shows debug on console)
DEBUG = True
DEBUG_MODE = True significantly slows processing by handling events sequentially. Use only for troubleshooting.Location: project/dns/runner.py:88

What Debug Mode Does

  • Processes multiple events in parallel using thread pools
  • Pool size determined by DNS_POOL_COUNT
  • Logs are concise (INFO level)
  • High throughput
multiProcNum = app.config['DNS_POOL_COUNT']  # e.g., 16
with Pool(multiProcNum) as pool:
    results = pool.map(searchDomain, priDocIds)

Debug Mode Output

With debug mode enabled, you’ll see detailed logs like:
[DEBUG] : Gathering 1000 THREAT events from ElasticSearch
[DEBUG] : Calling getDomainDoc() for example.com
[DEBUG] : Querying local cache for example.com
[DEBUG] : Domain last updated can't be older than 2026-02-02T10:15:32
[DEBUG] : Gathering domain info for example.com (10 API-points)
[DEBUG] : Initial AF domain query returned {cookie: abc123}
[DEBUG] : Cookie abc123 returned for query of example.com
[DEBUG] : Checking cookie abc123 (2 API-points)
[DEBUG] : Search completion 20% for example.com at 1 minute(s)
[DEBUG] : Calling processTagList({hit data})
[DEBUG] : Found tag(s) ['ELFMirai', 'Coinminer'] in sample
[DEBUG] : Processing tag ELFMirai
[DEBUG] : Calling assessTags({domain tag data})
[DEBUG] : Working on tag ELFMirai with class of malware_family
[DEBUG] : Calculating confidence level: Day differential of 45
[DEBUG] : confidence_level for ELFMirai @ date 2026-01-18T10:00:00: 70 based on age of 45 days
[DEBUG] : Saved event doc with the following data: {event data}
[DEBUG] : abc123 save: SUCCESS

Disable Debug Mode

After troubleshooting, disable debug mode for normal performance:
# Edit .panrc
LOG_LEVEL = "INFO"
DEBUG_MODE = False
DEBUG = False
Restart SafeNetworking for changes to take effect.

Check System Health

Comprehensive system health check procedure.

Quick Health Check Script

#!/bin/bash

echo "=== SafeNetworking Health Check ==="
echo

# Check if SafeNetworking is running
echo "[1] SafeNetworking Process:"
if pgrep -f "sfn start" > /dev/null; then
    echo "✓ Running"
    ps aux | grep "sfn start" | grep -v grep
else
    echo "✗ Not running"
fi
echo

# Check Elasticsearch
echo "[2] Elasticsearch:"
if curl -s "localhost:9200" > /dev/null 2>&1; then
    echo "✓ Accessible"
    curl -s "localhost:9200/_cluster/health?pretty" | grep -E '"cluster_name"|"status"'
else
    echo "✗ Not accessible"
fi
echo

# Check AutoFocus points
echo "[3] AutoFocus Points:"
if curl -s "localhost:9200/sfn-details/_doc/af-details" > /dev/null 2>&1; then
    POINTS=$(curl -s "localhost:9200/sfn-details/_doc/af-details" | 
             python3 -c "import sys, json; print(json.load(sys.stdin)['_source']['daily_points_remaining'])")
    echo "✓ Daily points remaining: $POINTS"
    if [ "$POINTS" -lt 500 ]; then
        echo "⚠ WARNING: Points below critical threshold (500)"
    elif [ "$POINTS" -lt 5000 ]; then
        echo "⚠ WARNING: Points below low threshold (5000)"
    fi
else
    echo "✗ Cannot retrieve point information"
fi
echo

# Check unprocessed events
echo "[4] Unprocessed Events:"
if curl -s "localhost:9200/threat-*/_count" > /dev/null 2>&1; then
    UNPROCESSED=$(curl -s "localhost:9200/threat-*/_count" -H 'Content-Type: application/json' -d '{
      "query": {
        "bool": {
          "must": [
            {"match": {"tags": "DNS"}},
            {"match": {"SFN.processed": 0}}
          ]
        }
      }
    }' | python3 -c "import sys, json; print(json.load(sys.stdin)['count'])")
    echo "Unprocessed DNS events: $UNPROCESSED"
else
    echo "✗ Cannot query events"
fi
echo

# Check recent errors in logs
echo "[5] Recent Errors (last 10):"
if [ -f "log/sfn.log" ]; then
    ERROR_COUNT=$(grep -E "ERROR|CRITICAL" log/sfn.log | wc -l)
    echo "Total errors in log: $ERROR_COUNT"
    grep -E "ERROR|CRITICAL" log/sfn.log | tail -10
else
    echo "✗ Log file not found"
fi
echo

# Check disk space
echo "[6] Disk Space:"
df -h | grep -E 'Filesystem|/$'
echo

echo "=== Health Check Complete ==="
Save as health_check.sh, make executable, and run:
chmod +x health_check.sh
./health_check.sh

Component Status Checklist

Operating System:
  • Linux distribution (Ubuntu, CentOS, RHEL)
  • Adequate CPU (multi-core recommended)
  • Sufficient RAM (4GB+ recommended)
  • Disk space for logs and Elasticsearch data
Check:
uname -a
free -h
df -h
  • Process running
  • Flask server responding on configured port
  • Background threads active (DNS, IoT, AF points)
  • No CRITICAL or ERROR messages in recent logs
  • Configuration file (.panrc) present and valid
Check:
ps aux | grep sfn
curl http://localhost:5000/
tail -50 log/sfn.log
  • Elasticsearch service running
  • Cluster status green or yellow
  • All required indices present
  • No red or unallocated shards
  • Adequate disk space
Check:
curl localhost:9200/_cluster/health?pretty
curl localhost:9200/_cat/indices?v
curl localhost:9200/_cat/shards?v | grep -E "UNASSIGNED|FAILED"
  • API key configured in .panrc
  • Daily points remaining > 500
  • Minute points not consistently exceeded
  • Queries returning results
  • Reasonable response times
Check:
curl -X POST "https://autofocus.paloaltonetworks.com/api/v1.0/tag/WildFireTest" \
  -H "Content-Type: application/json" \
  -d '{"apiKey": "your-key-here"}'

curl localhost:9200/sfn-details/_doc/af-details?pretty
  • Events being retrieved from Elasticsearch
  • Domain lookups succeeding
  • Tags being assessed and applied
  • Events marked as processed (SFN.processed = 1)
  • Reasonable processing rate
Check:
grep "Processing" log/sfn.log | tail -10
grep "save: SUCCESS" log/sfn.log | tail -10
grep "save: FAIL" log/sfn.log | tail -10

Common Error Messages

Quick Reference

Error MessageSeverityLikely CauseSolution
API Key for Autofocus is not setCRITICALMissing API keyAdd AUTOFOCUS_API_KEY to .panrc
Connection refusedERRORElasticsearch downStart Elasticsearch service
Connection timeoutERRORNetwork/ES slowCheck ES health and network
Daily Bucket ExceededWARNINGAF points exhaustedWait for reset or adjust thresholds
Minute Bucket ExceededWARNINGToo many parallel requestsReduce DNS_POOL_COUNT
Unable to work with event docERRORES or processing issueCheck ES connectivity and event structure
Transport ErrorERRORES communication failureRestart ES or check network
No local cache foundINFONormal, creating cacheNo action needed
No samples found for domainINFOAF query returned no resultsNormal for some domains
Slowing down executionINFOLow AF pointsNormal protection, wait for reset

Getting Additional Help

Enable Debug Logging

Set LOG_LEVEL = "DEBUG" in .panrc and review detailed logs for more context about errors.

Check Documentation

Review the monitoring guide for normal operational indicators and metrics to compare against.

Elasticsearch Documentation

Consult Elasticsearch documentation for cluster and index management issues.

AutoFocus Support

Contact Palo Alto Networks support for AutoFocus API issues or rate limit increases.

Next Steps

Monitoring

Set up monitoring to catch issues before they impact operations

Running SafeNetworking

Review configuration options to optimize performance

Build docs developers (and LLMs) love