Overview
Masar Eagle is designed as a distributed microservices architecture that can scale horizontally. This guide covers scaling strategies, performance optimization, and capacity planning.Architecture Scalability
Service Independence
Each service can scale independently:Stateless Services
All API services are stateless, enabling easy horizontal scaling:- No in-memory session state
- JWT-based authentication (no server-side sessions)
- Shared database for persistence
- Message queue for asynchronous communication
Scaling Strategies
Horizontal Scaling
Database Scaling
Read Replicas
Read Replicas
Configure PostgreSQL read replicas for read-heavy workloads:Use Cases:
- Trip search queries
- Report generation
- Analytics queries
- Admin dashboard data
Connection Pooling
Connection Pooling
PostgreSQL connection pooling is enabled by default in Npgsql:Tuning:
MinPoolSize: Keep connections warm (5-10)MaxPoolSize: Limit per service instance (50-100)- Monitor:
npgsql_connection_pools_num_connections
Database Sharding
Database Sharding
Consider sharding for very high scale:By Geography:
- Separate databases per region/city
- Users/Trips partitioned by zone
- Already separated: Users, Trips, Notifications, Auth
- Can further split if individual databases grow large
Indexing Strategy
Indexing Strategy
Key indexes for performance:
Message Queue Scaling
RabbitMQ Clustering
RabbitMQ Clustering
Scale RabbitMQ for high message throughput:Benefits:
- High availability
- Message distribution
- Increased throughput
Queue Configuration
Queue Configuration
Optimize queue settings (AppHost.cs:21-23):Production Settings:
- Enable lazy queues for large queues
- Set message TTL for transient messages
- Configure dead letter exchanges
- Use quorum queues for critical data
Wolverine Outbox
Wolverine Outbox
Transactional outbox ensures reliable messaging (Program.cs:116-122):Scaling Consideration:
- Outbox polling can be CPU-intensive
- Adjust polling interval based on message volume
- Monitor outbox table size
Performance Optimization
Response Compression
Compression is enabled by default (Users.Api/Program.cs:41-55):- 60-80% reduction in JSON response size
- Lower bandwidth costs
- Faster mobile client performance
Caching Strategy
HTTP Response Caching
Add response caching for static data:
Distributed Cache
Use Redis for shared cache:
In-Memory Caching
Cache configuration data:
EF Core Query Caching
Queries are automatically cached:
Background Jobs
Hangfire is configured for background processing (Trips.Api/Program.cs:115-127):- Trip reminders (TripReminder cron:
*/5 * * * *) - Auto-cancel overdue trips (AutoCancel cron:
*/10 * * * *) - Account purge (AccountPurgeWorker)
Request Size Limits
Large file uploads are supported (Users.Api/Program.cs:26-33):- Set reasonable limits (e.g., 10MB for images)
- Use streaming for large files
- Offload to object storage (S3, Azure Blob)
- Implement chunked uploads for >5MB
Monitoring for Scale
Key Metrics
- Service Metrics
- Database Metrics
- Queue Metrics
- Runtime Metrics
SLO Targets
| Metric | Target | Critical Threshold |
|---|---|---|
| Availability | Greater than 99.9% | Less than 99% |
| P95 Latency | Less than 500ms | Greater than 2s |
| Error Rate | Less than 1% | Greater than 5% |
| Database Conn Pool | Less than 80% | Greater than 95% |
| Queue Lag | Less than 100 msgs | Greater than 1000 msgs |
Capacity Planning
Resource Estimates
Per Service Instance
Per Service Instance
Minimum Resources:
- CPU: 0.5 cores
- Memory: 256MB
- Disk: 100MB
- CPU: 1-2 cores
- Memory: 512MB-1GB
- Disk: 1GB (for logs)
- ~200-500 req/sec per instance
- Scales linearly with CPU
PostgreSQL
PostgreSQL
Sizing:Example (16GB RAM, 4 cores):
- max_connections: 200
- shared_buffers: 4GB
- effective_cache_size: 8GB
- work_mem: 60MB
RabbitMQ
RabbitMQ
Minimum:
- CPU: 1 core
- Memory: 512MB
- Disk: 10GB
- CPU: 2-4 cores
- Memory: 2-4GB
- Disk: 50GB+ (message persistence)
- ~10,000 messages/sec
- 50,000+ concurrent connections
Growth Projections
Baseline Metrics
Collect 1-2 weeks of production metrics:
- Average requests/sec
- Peak requests/sec
- Database query count
- Message queue throughput
Deployment Strategies
Blue-Green Deployment
Rolling Updates
Best Practices
Load Test Before Scaling
Use tools like k6, JMeter, or Artillery to simulate production load:
Monitor Everything
Use the observability stack:
- Metrics: Prometheus + Grafana
- Logs: Loki
- Traces: Jaeger
- Alerts: Prometheus Alertmanager
Implement Circuit Breakers
Protect against cascading failures:Already configured via ServiceDefaults!
Use Async All The Way
Never block threads:
Next Steps
Monitoring
Set up comprehensive monitoring
Troubleshooting
Debug performance issues