Skip to main content
This guide covers common issues encountered when running SSV Node, along with debugging techniques and solutions based on logs, metrics, and operational best practices.

Quick Diagnostics

Start with these quick checks when troubleshooting:
# Check node health
curl http://localhost:15000/health

# Check if metrics are being collected
curl http://localhost:15000/metrics | grep ssv_

# View recent logs (if using systemd)
journalctl -u ssv-node -n 100 --no-pager

# Check validator status
curl http://localhost:13000/api/v1/validators
Always check both logs and metrics when troubleshooting. Logs provide context, while metrics show trends and quantitative data.

Common Issues

Node Startup Issues

Symptoms:
  • Node exits immediately on startup
  • Fatal error logs about configuration
  • Port already in use errors
Log examples:
{"level":"FATAL","msg":"failed to parse configuration","error":"invalid beacon node address"}
{"level":"FATAL","msg":"listen to 0.0.0.0:13001","error":"bind: address already in use"}
Solutions:
  1. Validate your configuration file:
./bin/ssvnode start-node --config=./config.yaml --dry-run
  1. Check for port conflicts:
# Check if ports are already in use
lsof -i :13000  # API port
lsof -i :13001  # P2P TCP port
lsof -i :12001  # P2P UDP port
lsof -i :15000  # Metrics port
  1. Verify beacon node connectivity:
curl http://your-beacon-node:5052/eth/v1/node/version
  1. Review required configuration fields:
  • eth2.BeaconNodeAddr
  • OperatorPrivateKey or KeyStore
  • Valid network configuration (ports, discovery)
Ensure your beacon node is fully synced before starting SSV Node.
Symptoms:
  • Error logs about database access
  • Permission denied errors
  • Corrupted database errors
Log examples:
{"level":"FATAL","msg":"failed to open database","error":"permission denied: ./data"}
{"level":"ERROR","msg":"database corruption detected","path":"./data/db"}
Solutions:
  1. Check directory permissions:
ls -la ./data
# Ensure SSV node process user has read/write access
chown -R ssv:ssv ./data
chmod 700 ./data
  1. If database is corrupted, restore from backup:
# Stop the node first
systemctl stop ssv-node

# Move corrupted database
mv ./data/db ./data/db.corrupted

# Restore from backup or resync from scratch
cp -r ./backups/db-latest ./data/db

# Restart node
systemctl start ssv-node
  1. Verify sufficient disk space:
df -h ./data
Symptoms:
  • Authentication failures
  • Unable to participate in duties
  • “Invalid operator key” errors
Solutions:
  1. Verify operator key format:
# Should be hex-encoded private key
cat operator-private-key.txt
# Format: 0x1234567890abcdef...
  1. Re-generate operator keys if needed:
./bin/ssvnode generate-operator-keys --password-file=password.txt
  1. Ensure operator is registered on the SSV contract:
# Check operator registration status
# (Use SSV webapp or contract interaction tools)

Validator Issues

Symptoms:
  • Validator shows as not_participating or no_index
  • No attestations or proposals being submitted
  • Missing validator duties
Metric checks:
# Check validator status
ssv_validator_validators_per_status{ssv_validator_status="not_participating"}
ssv_validator_validators_per_status{ssv_validator_status="no_index"}

# Check submission rates
rate(ssv_runner_submissions[5m])
rate(ssv_runner_submissions_failed[5m])
Log examples:
{"level":"WARN","msg":"validator not found","validator":"0x8234..."}
{"level":"WARN","msg":"validator not yet participating","validator_index":12345}
{"level":"DEBUG","msg":"validator not yet activated","validator":"0x8234..."}
Solutions:
  1. Verify validator is registered:
# Check validator on beacon chain
curl http://your-beacon-node:5052/eth/v1/beacon/states/head/validators/0x8234...
  1. Check validator shares are loaded:
# Query database for validator shares
curl "http://localhost:15000/database/count-by-collection?prefix=shares"

# Check validator API
curl http://localhost:13000/api/v1/validators
  1. Verify validator activation:
  • Check beacon chain validator status
  • Ensure sufficient balance (32 ETH)
  • Verify activation queue position
  1. Check operator cluster membership:
  • Ensure operator is part of validator’s cluster
  • Verify cluster has minimum operators (4/10/13 scheme)
  • Check operator IDs match cluster configuration
Symptoms:
  • Increasing ssv_runner_submissions_failed counter
  • Missed attestations or proposals
  • Error logs about submission failures
Metric checks:
# Failed submission rate
rate(ssv_runner_submissions_failed[5m])

# Failed submissions by role
sum by (ssv_beacon_role) (rate(ssv_runner_submissions_failed[5m]))

# Submission success rate
rate(ssv_runner_submissions[5m]) / 
  (rate(ssv_runner_submissions[5m]) + rate(ssv_runner_submissions_failed[5m]))
Log examples:
{"level":"ERROR","msg":"failed to submit attestation","error":"timeout exceeded","validator":"0x8234..."}
{"level":"ERROR","msg":"failed to submit proposal","error":"invalid signature","slot":394560}
Solutions:
  1. Check beacon node connectivity:
# Test beacon node API
curl http://your-beacon-node:5052/eth/v1/node/health

# Check submission endpoint
curl http://your-beacon-node:5052/eth/v1/beacon/pool/attestations
  1. Verify beacon node is synced:
curl http://your-beacon-node:5052/eth/v1/node/syncing
# Should return: {"data":{"is_syncing":false}}
  1. Check for slashing protection issues:
  • Review slashing protection database
  • Ensure no duplicate validator instances
  • Check for clock synchronization issues
  1. Monitor consensus duration:
# Should complete within 1-2 seconds
histogram_quantile(0.95, rate(ssv_runner_consensus_duration_bucket[5m]))
Running multiple instances of the same validator can cause slashing. Ensure only one node is running per validator.
Symptoms:
  • ssv_validator_validators_per_status{status="slashed"} > 0
  • Validator balance decreasing dramatically
  • Slashing event on beacon chain
CRITICAL - Immediate Actions:
  1. STOP THE NODE IMMEDIATELY:
systemctl stop ssv-node
  1. Investigate the cause:
  • Check if multiple instances were running
  • Review logs for double signing evidence
  • Check system clock synchronization
  • Verify slashing protection database integrity
  1. DO NOT RESTART until root cause is identified and resolved
  2. Common causes:
  • Running duplicate validator instances
  • Clock drift causing timing issues
  • Corrupted slashing protection database
  • Restored old database state
Slashing results in permanent loss of stake (minimum 1 ETH penalty, up to full stake). Prevention is critical.
Prevention:
  • Never run multiple instances of the same validator
  • Maintain accurate system time (use NTP)
  • Regular database backups
  • Proper shutdown procedures before migrations
  • Test failover procedures in testnet first

Network and P2P Issues

Symptoms:
  • Zero or very few peers connected
  • Unable to participate in consensus
  • Isolated from network
Diagnostic commands:
# Check peer count
curl http://localhost:13000/api/v1/node/peers | jq '.data | length'

# Check P2P connectivity
netstat -an | grep -E ":(13001|12001)"
Solutions:
  1. Verify P2P ports are open:
# Test UDP port (discovery)
nc -u -v your-public-ip 12001

# Test TCP port (libp2p)
nc -v your-public-ip 13001
  1. Check firewall rules:
# Allow inbound P2P connections
ufw allow 13001/tcp
ufw allow 12001/udp
  1. Verify NAT configuration:
  • Configure port forwarding for 12001/udp and 13001/tcp
  • Set correct external IP in config if behind NAT
p2p:
  HostAddress: your-public-ip
  1. Check bootnode connectivity:
# View logs for discovery events
journalctl -u ssv-node | grep -i "discovery\|peer"
Symptoms:
  • Logs showing “subscriber channel full, dropping the message”
  • Delayed consensus
  • Performance degradation
Log examples:
{"level":"WARN","msg":"subscriber channel full, dropping the message"}
{"level":"WARN","msg":"current slot and duty slot are not aligned"}
Metric checks:
# Queue size monitoring
ssv_queue_inbox_size

# Processing duration
histogram_quantile(0.95, rate(ssv_tracer_processing_duration_bucket[5m]))
Solutions:
  1. Check system resources:
# CPU usage
top -b -n 1 | grep ssvnode

# Memory usage
free -h

# Disk I/O
iostat -x 1 5
  1. Reduce log verbosity if using debug level:
LogLevel: "info"  # Change from debug to info
  1. Optimize database performance:
  • Ensure SSD storage for database
  • Check disk I/O wait times
  • Consider BadgerDB tuning parameters
  1. Scale hardware resources:
  • Increase CPU cores
  • Add more RAM
  • Use faster storage (NVMe)

Performance Issues

Symptoms:
  • High ssv_qbft_rounds_changed counter
  • Consensus taking >2 seconds
  • Frequent round timeouts
Metric checks:
# Round change rate
rate(ssv_qbft_rounds_changed[5m])

# Consensus duration (95th percentile)
histogram_quantile(0.95, rate(ssv_runner_consensus_duration_bucket[5m]))

# Per-phase durations
histogram_quantile(0.95, rate(ssv_runner_pre_consensus_duration_bucket[5m]))
histogram_quantile(0.95, rate(ssv_runner_post_consensus_duration_bucket[5m]))
Solutions:
  1. Check network latency to other operators:
# Ping cluster operators (if known)
ping operator-1.example.com
mtr operator-1.example.com
  1. Verify system clock synchronization:
# Check NTP sync status
timedatectl status

# Verify offset is minimal (<50ms)
chronyc tracking
  1. Check for underperforming cluster members:
  • Review operator performance in cluster
  • Consider replacing slow/unreliable operators
  • Verify all operators are online
  1. Monitor duty timing:
# Slot delay (should be minimal)
histogram_quantile(0.95, rate(ssv_scheduler_slot_delay_bucket[5m]))
Symptoms:
  • Node consuming excessive resources
  • System becoming unresponsive
  • OOM killer terminating node
Diagnostic commands:
# Memory usage breakdown
curl http://localhost:15000/debug/pprof/heap > heap.prof
go tool pprof -http=:8080 heap.prof

# CPU profiling
curl http://localhost:15000/debug/pprof/profile?seconds=30 > cpu.prof
go tool pprof -http=:8080 cpu.prof

# Goroutine analysis
curl http://localhost:15000/debug/pprof/goroutine > goroutine.prof
Solutions:
  1. Check for memory leaks:
  • Review heap profile for growing allocations
  • Monitor memory over time
  • Report findings to SSV team if leak detected
  1. Reduce load:
# Disable profiling if not needed
EnableProfile: false

# Use info level logging
LogLevel: "info"
  1. Optimize database:
# Check database size
du -sh ./data/db

# Database may need compaction (automatic in BadgerDB)
  1. Hardware recommendations:
  • Minimum: 4 CPU cores, 8GB RAM
  • Recommended: 8+ CPU cores, 16GB+ RAM
  • Storage: SSD/NVMe with 100GB+ free space

Beacon Node Integration Issues

Symptoms:
  • Repeated connection errors in logs
  • Unable to fetch duties
  • No duty execution
Log examples:
{"level":"ERROR","msg":"failed to fetch duties for current epoch","error":"connection refused"}
{"level":"ERROR","msg":"couldn't fetch node version","error":"context deadline exceeded"}
Solutions:
  1. Verify beacon node is running and accessible:
curl http://your-beacon-node:5052/eth/v1/node/health
curl http://your-beacon-node:5052/eth/v1/node/version
  1. Check network connectivity:
# Test connection
telnet your-beacon-node 5052

# Check DNS resolution
nslookup your-beacon-node
  1. Verify beacon node is synced:
curl http://your-beacon-node:5052/eth/v1/node/syncing
  1. Configure multiple beacon nodes for redundancy:
eth2:
  BeaconNodeAddr: http://beacon-1:5052,http://beacon-2:5052
Symptoms:
  • Frequent reorg event logs
  • Duty execution errors after reorgs
  • Inconsistent state
Log examples:
{"level":"INFO","msg":"🔀 reorg event received","event":{"slot":394560,"depth":2}}
Analysis:
  1. Check reorg frequency and depth:
# Count reorgs in last hour
journalctl -u ssv-node --since "1 hour ago" | grep "reorg event" | wc -l

# Check reorg depth
journalctl -u ssv-node | grep "reorg event" | grep -o '"depth":[0-9]*'
  1. Shallow reorgs (1-2 blocks) are normal
  2. Deep reorgs (>3 blocks) indicate beacon chain issues
Solutions:
  • Ensure beacon node is well-connected to network
  • Verify beacon node is following correct chain
  • Check beacon node peers and sync status
  • Monitor beacon chain health (external tools)

Debugging Tools

Health Check Endpoint

# Basic health check
curl -v http://localhost:15000/health

# Expected response when healthy:
# HTTP/1.1 200 OK

# Error response example:
# HTTP/1.1 500 Internal Server Error
# {"errors":["beacon node unreachable"]}

Metrics Inspection

# Check specific metric
curl -s http://localhost:15000/metrics | grep ssv_validator_validators_per_status

# Export all metrics for analysis
curl -s http://localhost:15000/metrics > metrics-snapshot.txt

# Check metric value over time
watch -n 5 'curl -s http://localhost:15000/metrics | grep ssv_runner_submissions'

Log Analysis Tools

# Real-time log following (JSON)
journalctl -u ssv-node -f | jq .

# Filter errors from last hour
journalctl -u ssv-node --since "1 hour ago" | jq 'select(.level == "ERROR")'

# Create error summary
journalctl -u ssv-node --since "24 hours ago" | 
  jq -r 'select(.level == "ERROR") | .msg' | 
  sort | uniq -c | sort -rn

Database Inspection

# Count total keys
curl -s http://localhost:15000/database/count-by-collection | jq '.count'

# Count validator shares
curl -s "http://localhost:15000/database/count-by-collection?prefix=shares" | jq '.count'

# Database size
du -sh ./data/db

Getting Help

Information to Collect

When seeking help, gather:
  1. Node Information:
    ./bin/ssvnode version
    
  2. Configuration (sanitized - remove private keys):
    cat config.yaml | grep -v -i "private\|key\|password"
    
  3. Recent Logs:
    journalctl -u ssv-node --since "1 hour ago" --no-pager > logs.txt
    
  4. Metrics Snapshot:
    curl -s http://localhost:15000/metrics > metrics.txt
    
  5. System Info:
    uname -a
    free -h
    df -h
    

Support Channels

Discord Community

Join the SSV community for real-time support and discussions

GitHub Issues

Report bugs and track known issues

Documentation

Review comprehensive documentation and guides

API Reference

Explore API endpoints for monitoring and management
Never share:
  • Operator private keys
  • Validator private keys or keystores
  • Complete configuration files (may contain sensitive data)
  • Production wallet addresses or seed phrases
Always sanitize logs and configs before sharing publicly.

Preventive Maintenance

Regular Checks

  • Daily: Review error logs and metrics dashboards
  • Weekly: Check disk space, database size, system updates
  • Monthly: Review performance trends, backup verification, security updates

Monitoring Best Practices

  1. Set up alerts for:
    • Node health check failures
    • High failed submission rates
    • Validator status changes
    • Resource exhaustion (disk, memory)
  2. Maintain backups:
    • Database backups (before upgrades)
    • Configuration backups
    • Operator key backups (encrypted, secure storage)
  3. Keep software updated:
    • Monitor SSV releases
    • Test updates in testnet first
    • Follow upgrade procedures carefully

Next Steps

Metrics Setup

Configure Prometheus and Grafana for comprehensive monitoring

Logging Configuration

Optimize logging for better debugging and analysis

Build docs developers (and LLMs) love