Understanding pyinfra Performance
pyinfra’s performance is affected by:- Number of hosts: Operations execute in parallel across hosts
- Number of operations: Each operation involves fact gathering and command execution
- Network latency: SSH connections and command execution time
- Fact gathering: Frequent fact queries can slow deployments
- Operation complexity: Complex operations with many conditionals
Parallel Execution
pyinfra uses gevent for concurrent execution across hosts.Controlling Parallelism
Adjust the number of parallel operations:Optimal Parallel Settings
Rules of thumb:- Small clusters (< 10 hosts): Use default (10)
- Medium clusters (10-100 hosts): Set to 20-50
- Large clusters (> 100 hosts): Set to 50-100
- Very large clusters (> 1000 hosts): Consider batching (see below)
Fact Caching
Facts are cached per deployment, but repeated queries in operations can still be slow.Avoid Repeated Fact Queries
Bad - queries fact multiple times:Preload Facts
For operations that always need certain facts, query them upfront:Batch Operations
For very large deployments, batch hosts into groups:Connection Reuse
SSH connections are expensive. Reuse them where possible.SSH ControlMaster
Enable SSH connection multiplexing:Keep Connections Alive
Set SSH keep-alive to prevent connection timeouts:Optimize Operations
Minimize Commands
Combine multiple commands into one: Bad - three separate commands:Use Idempotent Checks
Skip operations that don’t need to run:Reduce Logging Output
Logging can slow down deployments with many operations.Adjust Log Level
Disable Fact Output
File Transfer Optimization
Use rsync for Large Files
pyinfra supports rsync for efficient file transfers:Compress Files Before Transfer
Memory Optimization
For deployments with many hosts, memory usage can be significant.Limit Stored Output
By default, all command output is stored in memory:Clean Up After Operations
Delete temporary files during deployment:Profiling Deployments
Time Individual Operations
Add timing to your deploy script:Deployment Summary
After deployment, review the summary:Caching Strategy
Cache Expensive Operations
For operations that rarely change, cache their results:Network Optimization
Reduce Round Trips
Minimize commands that require remote state checks:Use Local Execution
For operations that don’t need remote execution:Database and Service Operations
Batch Database Operations
Best Practices Summary
- Increase parallelism for large deployments (—parallel flag)
- Cache facts by querying once and reusing results
- Batch operations for very large host counts
- Enable SSH ControlMaster for connection reuse
- Combine commands to reduce round trips
- Use idempotent checks to skip unnecessary work
- Reduce logging in production deployments
- Use rsync for large file transfers
- Clean up temporary files to save memory
- Profile deployments to identify bottlenecks
Benchmarking Example
Compare before and after optimization:Next Steps
- Learn about Debugging when things go wrong
- Explore Writing Operations for efficient operation design
- See API Reference for configuration options
