Troubleshooting
This guide covers common issues you may encounter when running a Harmonic Salsa validator and their solutions.Connection Problems
Validator Not Connecting to Network
Symptoms:- Validator shows 0 peers in gossip
- Cannot find entrypoints
- Network timeout errors
-
Check network connectivity:
-
Verify entrypoints are correct:
-
Check firewall settings:
-
Verify ports are open:
-
Check if validator is binding to correct interface:
Low Peer Count
Symptoms:- Fewer than 50 peers
- Intermittent connectivity
-
Add more entrypoints:
-
Check for rate limiting:
- Verify your IP is not being rate-limited
- Check with your hosting provider
-
Verify system resources:
-
Increase connection limits if needed:
RPC Connection Refused
Symptoms:- Cannot connect to local RPC
- “Connection refused” errors
-
Verify RPC is enabled:
-
Check RPC bind address:
-
Test RPC locally:
-
Check if port is in use:
Sync Issues
Validator Not Catching Up
Symptoms:- Catchup percentage stuck below 100%
- Slot number far behind network
- High “distance from network” value
-
Check system resources:
-
Verify snapshot download:
-
Try downloading from different validator:
-
Clear ledger and resync:
-
Check bandwidth:
Slow Snapshot Download
Symptoms:- Snapshot download taking hours
- Slow download speed in logs
-
Increase minimum download speed:
-
Add more known validators:
-
Check network congestion:
-
Consider pre-downloading snapshot:
- Download snapshot archive from a trusted source
- Extract to ledger directory
Validator Keeps Restarting During Catchup
Symptoms:- Validator crashes during replay
- Out of memory errors
- Database corruption errors
-
Check memory usage:
-
Verify disk space:
-
Check for disk errors:
-
Increase swap if needed:
-
Verify RocksDB settings:
Performance Problems
High CPU Usage
Symptoms:- CPU consistently above 90%
- Validator lagging behind network
- High system load
-
Check CPU allocation:
-
Optimize thread settings:
-
Reduce QUIC endpoints if needed:
-
Check for competing processes:
-
Verify CPU governor:
High Memory Usage
Symptoms:- Memory usage growing over time
- System using swap heavily
- OOM killer terminating processes
-
Enable disk-based accounts index:
-
Reduce accounts cache:
-
Limit snapshot retention:
-
Monitor for memory leaks:
Slow Disk I/O
Symptoms:- High I/O wait time
- Slow ledger replay
- Banking stage lag
-
Verify disk performance:
-
Use separate drives:
-
Check disk health:
-
Optimize RocksDB:
-
Monitor disk usage:
Network Bandwidth Exhausted
Symptoms:- High packet loss
- Slow block reception
- Network timeouts
-
Monitor bandwidth usage:
-
Limit QUIC connections:
-
Verify network hardware:
- Check for 1 Gbps link
- Verify no errors on interface:
-
Consider upgrade:
- Upgrade to 10 Gbps if available
Voting Issues
Validator Not Voting
Symptoms:- No votes being submitted
- Vote credits not increasing
- Validator marked as delinquent
-
Verify vote account configuration:
-
Check validator identity has SOL:
-
Verify vote account authority:
-
Check for
--no-votingflag: -
Review logs for vote errors:
High Skip Rate
Symptoms:- Block production skip rate above 5%
- Missing leader slots
- Poor performance metrics
-
Check system performance:
-
Verify network connectivity:
-
Optimize block production:
-
Check PoH speed:
-
Consider hardware upgrade:
- Faster CPU for PoH
- Faster NVMe for ledger access
Log Analysis
Understanding Common Errors
”Slot X is not a descendant of Y”
Cause: Fork mismatch during replay Solution:- Usually resolves automatically
- If persistent, may need to clear ledger and resync
”Transaction would exceed account data limit”
Cause: Transaction trying to allocate too much data Solution:- Not a validator issue
- Transaction submitter needs to fix
”Blockstore error: SlotNotRooted”
Cause: Accessing non-rooted slot data Solution:- Usually transient
- If persistent, check ledger integrity
”Tower vote failed”
Cause: Issue with tower voting logic Solution:- Check tower file integrity
- Review recent consensus changes
- May need to reset tower (only if instructed)
Log Filtering
Set appropriate log levels:Database Issues
RocksDB Corruption
Symptoms:- “Corruption: bad block contents” errors
- Validator crashes on startup
- Database errors in logs
-
Stop validator:
-
Backup current state:
-
Clear corrupted database:
-
Restart validator:
-
If issue persists, full resync:
Accounts Database Issues
Symptoms:- Account verification failures
- Accounts hash mismatch
- Bank hash mismatch
-
Verify accounts integrity:
-
Clear accounts cache:
-
Full accounts resync:
Emergency Procedures
Validator Completely Stuck
-
Check if process is responsive:
-
Try graceful shutdown:
-
If unresponsive, force kill:
-
Check for core dumps:
-
Review system logs:
-
Restart validator:
Recovering from Delinquency
Steps:-
Verify validator is caught up:
-
Check vote account status:
-
Monitor for vote submission:
-
Wait for next epoch:
- Delinquency clears at epoch boundary
- Continue monitoring vote credits
-
If still delinquent, check:
- System resources
- Network connectivity
- Vote account balance
Getting Help
Diagnostic Information to Collect
When seeking help, gather:-
Validator version:
-
System info:
-
Recent logs:
-
Configuration:
-
Network status:
Community Resources
- Harmonic Discord: Technical support channel
- Solana Discord: General validator questions
- GitHub Issues: Report bugs at https://github.com/harmonic/salsa/issues
Preventive Maintenance
Regular Backups
Backup keypairs and tower state regularly.
Monitor Metrics
Set up comprehensive monitoring and alerts.
Keep Updated
Stay current with validator software updates.
Test Changes
Test configuration changes on testnet first.
Next Steps
Configuration
Review and optimize your configuration
Monitoring
Set up better monitoring to catch issues early
Operations
Learn more operational best practices