Upgrade Procedures
This guide covers best practices for upgrading Harmonic Salsa validators, including version upgrades, rollback procedures, and zero-downtime updates.Version Upgrade Strategy
Release Channels
Harmonic Salsa follows the Agave release channel model: Edge Channel:- Tracks
masterbranch - Least stable
- Bleeding edge features
- For testing only, never use in production
- Tracks latest
vX.Ystabilization branch - More stable than edge
- Suitable for testnet validators
- Feature testing before mainnet deployment
- Tracks second-latest
vX.Ystabilization branch - Most stable
- Recommended for mainnet production validators
- Battle-tested on testnet and beta
Version Numbering
Format:vX.Y.Z
- X - Major version (breaking changes)
- Y - Minor version (new features, stabilization branch)
- Z - Patch version (bug fixes)
- Major: 3
- Minor: 1
- Patch: 5
Upgrade Planning
Before Upgrading:-
Read Release Notes:
- Check CHANGELOG.md in repository
- Review GitHub releases: https://github.com/anza-xyz/agave/releases
- Understand breaking changes and deprecations
-
Test on Testnet:
- Always test new versions on testnet first
- Run for at least 24-48 hours
- Monitor performance and stability
-
Check Compatibility:
- Verify snapshot format compatibility
- Check for removed command-line arguments
- Review any feature gate activations
-
Plan Timing:
- Avoid upgrading during high-stake events
- Consider cluster epoch boundaries
- Schedule during low-activity periods
- Coordinate with stake delegators if needed
Pre-Upgrade Checklist
Critical Steps:-
Backup Configuration:
-
Document Current Version:
-
Verify Validator Health:
-
Check Disk Space:
-
Verify Account Balances:
-
Review Breaking Changes:
- Check CHANGELOG.md for version-specific changes
- Note any deprecated arguments to remove
- Identify new required arguments
Upgrade Procedure
Standard Upgrade (With Downtime)
This is the safest upgrade method, accepting brief downtime. Step 1: Build New Version- Check skip rate
- Verify vote credits increasing
- Monitor resource usage
- Review logs for errors
- Confirm no delinquency
Zero-Downtime Upgrade
For critical validators requiring minimal downtime, use restart windows. Prerequisites:- Good understanding of cluster restart windows
- Monitoring and alerting in place
- Tested upgrade procedure
-
Identify Restart Window:
-
Prepare Binary in Advance:
-
Quick Swap During Window:
-
Monitor Recovery:
Automated Upgrade Script
WARNING: Only use automation after thoroughly testing manual upgrades.Rollback Procedures
When to Rollback
Rollback if:- Validator fails to start after upgrade
- Critical errors in logs
- Unexpected performance degradation
- High skip rate or delinquency
- Incompatibility with cluster
Quick Rollback
Step 1: Stop Current VersionHandling Data Incompatibility
If new version modified blockstore format: Option 1: Keep New Data (Forward Only)- Some versions create incompatible blockstore changes
- Rollback may require fresh snapshot download
- Review release notes for compatibility
Snapshot Compatibility
Understanding Snapshot Versions
Snapshot Format Changes:- Major versions may change snapshot format
- Typically backward compatible for one minor version
- v2.2 snapshots compatible with v2.1 but not v2.0
Managing Snapshot Upgrades
Before Upgrading:- New format snapshots generated automatically
- Old format snapshots remain readable (if compatible)
- Can rollback while old snapshots available
Maintenance Windows
Planning Maintenance
Best Times for Upgrades:-
Mid-Epoch:
- After leader schedule generation
- Before critical voting periods
- Check with:
solana epoch-info
-
Low Stake Periods:
- When validator has fewer leader slots
- Minimize impact on network
-
Coordinated Windows:
- Follow cluster upgrade schedules
- Monitor Discord announcements
- Align with other validators when possible
- Epoch boundaries
- High-stakes events or campaigns
- Known cluster instability
- Your leader slot assignments
- Network-wide upgrades (wait for stability)
Communication
Notify Stakeholders: If running a public validator:- Announce maintenance window in advance
- Provide expected downtime
- Share upgrade version and reasons
- Post-upgrade status update
- Twitter/social media
- Validator website
- Discord announcements
- Stake pool communications
Version-Specific Upgrade Notes
Upgrading to v3.0+
Breaking Changes:- XDP requires additional capabilities
- Many deprecated arguments removed
- Dynamic port range now includes client ports (need 25+ ports)
- Snapshot format changes
- Legacy shreds no longer supported
Upgrading to v2.0+
Breaking Changes:- Removed obsolete RPC v1 endpoints
- Deprecated RpcClient methods removed
- Snapshot format changes (SIMD-215)
Post-Upgrade Monitoring
Critical Metrics (First 24 Hours)
Immediate (0-1 Hour):- Process running
- No critical errors in logs
- Validator in gossip
- Beginning catchup
- Catchup completed
- Voting resumed
- Skip rate normal (less than 5%)
- Resource usage normal
- Vote credits increasing
- No delinquency
- Performance stable
- No memory leaks
- Consistent performance
- No degradation trends
- Stakeholder feedback positive
Monitoring Commands
Troubleshooting Upgrades
Upgrade Failed to Start
Issue: Validator won’t start after upgrade Resolution:- Check logs for specific error
- Verify all deprecated arguments removed
- Check for new required arguments
- Verify binary compatibility
- Rollback if needed
Performance Degradation
Issue: Higher skip rate or resource usage after upgrade Resolution:- Monitor for 6-12 hours (may stabilize)
- Check release notes for known issues
- Verify system resources adequate
- Review new default settings
- Consider rollback if persistent
Snapshot Download Loop
Issue: Continuously downloading snapshots Resolution:- Verify snapshot compatibility
- Check disk space
- Verify known validators are responsive
- Check network connectivity
- May need to download from specific validator
Best Practices Summary
Always:- Read release notes thoroughly
- Test on testnet first
- Backup configuration before upgrade
- Use graceful shutdown (v2.3+)
- Monitor for 24+ hours post-upgrade
- Keep previous binary for quick rollback
- Upgrade without reading changelog
- Skip testnet testing for major versions
- Upgrade during epoch boundaries
- Use debug builds in production
- Upgrade multiple major versions at once
- Validator uptime directly impacts rewards
- Rushed upgrades cause more downtime than planned ones
- Community support available in Discord
- When in doubt, wait for cluster stability
- RELEASE.md in repository
- CHANGELOG.md for version-specific changes
- GitHub releases for detailed notes
- Discord #validator-support for help