Topology planning
Choose deployment pattern
Review topology patterns to select the best configuration for your latency and resiliency requirements.
Node distribution
- Deploy at least 3 nodes for fault tolerance
- Use at least 3 nodes per region for multi-region deployments
- Distribute nodes across availability zones
- Use identical hardware for all nodes
Hardware requirements
CPU sizing
Minimum
4 vCPUs per nodeAbsolute minimum for production stability. Below this, foreground workloads compete with background tasks.
Recommended
8-16 vCPUs per nodeOptimal range for most workloads. Maximum tested: 32 vCPUs per node.
Memory provisioning
Recommendation: 4 GiB RAM per vCPU- Minimum acceptable: 2 GiB per vCPU (testing only)
- Benefits decrease beyond 4 GiB per vCPU as CPU count increases
- Disable memory swap on Linux systems
Under-provisioning RAM causes reduced caching, disk spilling, and potential OOM crashes.
Storage specifications
| Metric | Recommendation |
|---|---|
| Capacity per vCPU | 100-150 GiB |
| Maximum per node | 10 TiB |
| IOPS per vCPU | 500 |
| Throughput per vCPU | 30 MB/s |
| Filesystem | ext4 or XFS |
Use SSDs
Always use solid-state drives. HDDs don’t provide sufficient IOPS for production workloads.
Separate volumes
- Store data on dedicated volume (not OS disk)
- Keep logs on separate volume from data
Monitor capacity
- Maintain 10-15% free space at all times
- Set up alerts at 80% usage
- CockroachDB creates automatic ballast files for emergencies
Cloud-specific recommendations
- AWS
- GCP
- Azure
Instance types:
- i3.xlarge, i3.2xlarge, i3.4xlarge (local SSD)
- m6i.xlarge, m6i.2xlarge (network storage)
- gp3 volumes (cost-effective, 3000 IOPS default)
- io2 volumes (higher IOPS, provision separately)
- Provision IOPS and throughput to meet 500 IOPS and 30 MB/s per vCPU
Security configuration
TLS certificates
Generate certificates
Use
cockroach cert or openssl to create:- CA certificate and key
- Node certificates (common name:
node) - Client certificates (common name: username)
Distribute certificates
- Place CA cert and node cert/key on each node
- Store CA key in secure location (off cluster)
- Distribute client certificates to application servers
Authentication methods
- Recommended: Client certificates for applications
- Alternative: Password authentication with strong passwords
- Enterprise: SSO/SAML integration
Networking configuration
Required ports
| Port | Purpose |
|---|---|
| 26257 | Inter-node and client connections (SQL) |
| 8080 | DB Console (HTTP) |
Network flags
- Single network (private)
- Single network (public)
- Multi-network
--advertise-addr default to --listen-addr.Load balancing
Load balancing is essential for performance and reliability. It distributes traffic and routes around failed nodes.
Health checks
Configure load balancers to use the readiness endpoint:High availability
Connection pooling
Critical for performance: Applications must use connection pools.Sizing guidelines
- Minimum: 4-10 connections
- Small applications: 10-20 connections
- Large applications: 20-50 connections per application instance
Configuration parameters
max_connections: Maximum pool sizemin_connections: Minimum idle connectionsmax_lifetime: Connection lifetime (prevent stale connections)idle_timeout: Close idle connections
Cache and memory tuning
Default settings (not recommended for production)
Production settings
Increasing cache improves read performance. Increasing SQL memory allows more concurrent connections and complex queries.
Monitoring and alerting
Essential metrics
CPU usage
Alert at >80% sustained usage
Memory usage
Alert at >85% of available RAM
Disk capacity
Alert at >80% full
Disk IOPS
Monitor against provisioned limits
Monitoring tools
- DB Console: Built-in metrics at
http://<node>:8080 - Prometheus: Scrape
/var/lib/cockroach/cockroach-data/cockroach.prometheus - Grafana: Use official CockroachDB dashboards
- Alertmanager: Configure alerts for critical conditions
Backup and restore
Use cloud storage
- Amazon S3
- Google Cloud Storage
- Azure Blob Storage
Clock synchronization
Configuration
File descriptors
Requirements
- Minimum: 1956 (1700 per store + 256 for networking)
- Recommended: 15000+ (10000 per store + 5000 for networking)
- Best: Unlimited
Linux configuration
- With systemd
- Without systemd
Add to service definition:Reload systemd:
Kubernetes-specific
Storage
- Use local SSDs, not network storage
- Configure storage class appropriately
Resources
- Set CPU and memory limits
- Make requests equal to limits
- Avoid burstable instances
Topology
- Configure pod anti-affinity
- Use topology spread constraints
- One pod per Kubernetes node
Operator
- Use CockroachDB Operator for management
- Configure PodDisruptionBudget
- Set up proper RBAC
Transaction retry handling
Transaction contention can cause retries. Your application should:- Catch transaction retry errors
- Retry the entire transaction
- Use exponential backoff
- Set maximum retry limits
Pre-deployment checklist
Hardware
- Sufficient CPU (min 4 vCPUs, recommended 8+)
- Adequate RAM (4 GiB per vCPU)
- SSD storage with required IOPS
- Network connectivity between nodes
Security
- TLS certificates generated and distributed
- CA key stored securely
- Firewall rules configured
- Network isolation in place
Configuration
- Clock synchronization configured
- File descriptor limits increased
- Cache and SQL memory tuned
- Load balancer deployed and tested
Operations
- Monitoring and alerting configured
- Backup schedule created
- Restore procedure tested
- Incident response plan documented
Next steps
- Review topology patterns
- Set up monitoring
- Configure backup strategies
- Learn about disaster recovery