Overview
This guide covers production-ready deployment configurations, security hardening, and operational best practices for running InterviewGuide at scale.Pre-Deployment Checklist
Environment Configuration
Essential Configuration
Essential Configuration
- All environment variables set and validated
- AI API keys secured (use secrets management)
- Database credentials rotated from defaults
- Redis password configured
- Object storage access keys generated with minimal permissions
- CORS origins restricted to production domains
- JPA
ddl-autoset tovalidateornone
Infrastructure Readiness
Resource Requirements
Resource Requirements
Minimum Production Specs:
- Backend: 2 CPU cores, 4GB RAM, 20GB storage
- PostgreSQL: 2 CPU cores, 8GB RAM, 100GB SSD (expandable)
- Redis: 1 CPU core, 2GB RAM, 10GB storage
- Object Storage: 500GB+ (grows with usage)
- Backend: 4-8 CPU cores, 8-16GB RAM (horizontal scaling)
- PostgreSQL: 4-8 CPU cores, 16-32GB RAM, NVMe SSD with replication
- Redis: 2 CPU cores, 4GB RAM with persistence enabled
Security Hardening
Security Checklist
Security Checklist
- TLS/SSL certificates installed for all public endpoints
- Database not exposed to public internet
- Redis protected by password and firewall rules
- Object storage buckets have proper access policies
- Rate limiting enabled on API endpoints
- File upload validation and antivirus scanning
- Security headers configured (CSP, HSTS, X-Frame-Options)
- Dependency vulnerability scans passed
Monitoring & Observability
Observability Checklist
Observability Checklist
- Application logging configured (JSON format recommended)
- Log aggregation system connected (ELK, Grafana Loki, CloudWatch)
- Metrics collection enabled (Prometheus, CloudWatch, Datadog)
- Health check endpoints monitored
- Alerting rules configured for critical errors
- Distributed tracing enabled (optional but recommended)
Backup & Disaster Recovery
Backup Checklist
Backup Checklist
- Automated PostgreSQL backups scheduled (daily minimum)
- Backup retention policy defined (30+ days recommended)
- Backup restoration tested successfully
- Object storage versioning enabled
- Redis persistence configured (RDB + AOF)
- Database replication configured for high availability
Production Configuration
Database Configuration
application-prod.yml
JPA ddl-auto Settings Explained
JPA ddl-auto Settings Explained
| Mode | Behavior | Production Safe? |
|---|---|---|
create | Drops and recreates all tables on startup | ❌ Never - causes data loss |
create-drop | Creates on startup, drops on shutdown | ❌ Never - causes data loss |
update | Automatically modifies schema to match entities | ⚠️ Risky - can cause data corruption |
validate | Only validates schema, fails if mismatch | ✅ Recommended for production |
none | Does nothing, full manual control | ✅ Best for production |
Connection Pool Tuning
Connection Pool Tuning
HikariCP Settings:For 4-core server with SSD:
- maximum-pool-size: Maximum active connections (typically 2-3× CPU cores)
- minimum-idle: Keep warm connections ready (20-30% of max)
- connection-timeout: How long to wait for connection (30s default)
- idle-timeout: Close idle connections after this time (10 min)
- max-lifetime: Force connection renewal (30 min, prevents stale connections)
(4 × 2) + 1 = 9 connections (round up to 10-20)Vector Store Configuration
application-prod.yml
Redis Configuration
application-prod.yml
redis.conf):
redis.conf
Redis Persistence Strategies
Redis Persistence Strategies
RDB (Snapshotting):
- Periodic point-in-time snapshots
- Faster restart times
- Risk: May lose data since last snapshot
- Logs every write operation
- More durable (can sync every second or every write)
- Larger file size, slower restart
Object Storage Configuration
application-prod.yml
AWS S3
- Enable versioning for accidental deletion recovery
- Configure lifecycle policies for cost optimization
- Use CloudFront CDN for global distribution
- Enable server-side encryption (SSE-S3 or SSE-KMS)
Alibaba Cloud OSS
- Enable versioning and Cross-Region Replication
- Use CDN for faster content delivery in China
- Configure bucket policies for least-privilege access
- Enable server-side encryption (AES256 or KMS)
Self-Hosted MinIO
- Deploy in distributed mode (4+ nodes) for HA
- Configure erasure coding for data protection
- Set up replication to secondary datacenter
- Enable MinIO KES for encryption key management
Backup Strategy
- Enable object versioning
- Configure lifecycle rules to archive old versions
- Replicate critical buckets to separate region
- Test restoration procedures regularly
Security Configuration
application-prod.yml
nginx.conf
Monitoring & Logging
Application Logging
application-prod.yml
logback-spring.xml
Health Checks
application-prod.yml
- Liveness:
/actuator/health/liveness- Is the app running? - Readiness:
/actuator/health/readiness- Can it accept traffic? - Startup:
/actuator/health/startup- Has initialization completed?
Metrics Collection
- Prometheus
- CloudWatch
- Datadog
application-prod.yml
prometheus.yml
Backup & Disaster Recovery
PostgreSQL Backup Strategy
Replication for High Availability
Configure streaming replication to standby server:Primary Server (Standby Server (
postgresql.conf):recovery.conf):Redis Backup
Scaling Considerations
Horizontal Scaling
Backend Instances
Stateless Design enables easy horizontal scaling:
- Session state stored in Redis (shared across instances)
- No local file storage (uses S3)
- Load balancer distributes traffic (Nginx, ALB, HAProxy)
Database Read Replicas
Read-Heavy Workloads:
- Configure read replicas for report generation
- Use read/write splitting in application
- Monitor replication lag (under 1s target)
Performance Tuning
JVM Tuning
JVM Tuning
- Heap Size: 50-75% of container memory
- GC: G1GC for predictable pause times
- Monitoring: Enable JMX for heap analysis
Connection Pool Sizing
Connection Pool Sizing
PostgreSQL:Redis:
Caching Strategy
Caching Strategy
Cost Optimization
AI API Costs
Optimization Strategies:
- Use cheaper models for simple tasks (qwen-plus vs qwen-max)
- Implement request deduplication
- Cache common AI responses
- Set token limits per request
Storage Costs
Lifecycle Policies:
- Archive old resumes to Glacier/Archive after 90 days
- Delete analysis reports after 1 year
- Compress uploaded documents
S3 Lifecycle Rule
Database Optimization
Cost Reduction:
- Enable compression for large text columns
- Partition large tables by date
- Archive old interview sessions
- Use appropriate instance types
Compute Efficiency
Right-sizing:
- Monitor actual resource usage
- Use auto-scaling during peak hours
- Consider spot instances for non-critical workloads
- Enable CPU/memory limits in containers
Troubleshooting Production Issues
High Database CPU Usage
High Database CPU Usage
Symptoms: Slow queries, connection pool exhaustionDebug Steps:Solutions:
- Add indexes on frequently queried columns
- Enable query result caching
- Increase connection pool size
- Consider read replicas
Memory Leaks
Memory Leaks
Symptoms: Container memory grows over time, eventual OOM killsDebug Steps:Common Causes:
- Unclosed streams or connections
- Large objects held in cache
- ThreadLocal leaks in web applications
Redis Stream Backlog
Redis Stream Backlog
Symptoms: Messages not processed, growing stream lengthDebug Steps:Solutions:
- Scale up consumer instances
- Increase consumer concurrency
- Check for stuck messages (claim and retry)
- Monitor consumer error rates
Object Storage Failures
Object Storage Failures
Symptoms: File upload errors, 403/404 responsesDebug Steps:Solutions:
- Verify IAM permissions
- Check bucket CORS configuration
- Enable S3 access logs for debugging
- Implement retry logic with exponential backoff
Security Incident Response
Incident Detection
Monitor for:
- Unusual API traffic patterns
- Failed authentication attempts
- Unauthorized file access
- SQL injection attempts
- Abnormal resource usage
Investigation
- Review access logs for unauthorized activity
- Check database audit logs
- Analyze file upload history
- Verify data integrity
Recovery
- Restore from clean backup if data compromised
- Apply security patches
- Update firewall rules
- Force password resets for affected users
Compliance & Auditing
Data Privacy
GDPR/CCPA Compliance:
- Implement data retention policies
- Provide data export functionality
- Support “right to be forgotten” (data deletion)
- Log all data access for audit trails
Audit Logging
Track:
- User authentication events
- Resume uploads and deletions
- Configuration changes
- Database schema modifications
Next Steps
Monitoring Setup
Implement comprehensive observability
CI/CD Pipeline
Automate testing and deployment
Architecture Guide
Deep dive into system design
API Reference
Explore REST API documentation
