Overview
Managing data lifecycle in Permission Mongo involves retention policies, archival strategies, MongoDB TTL indexes, and horizontal scaling techniques for growing datasets.
Data Retention Strategies
Time-Based Retention
Implement retention policies using MongoDB TTL indexes:
// Auto-delete audit logs after 90 days
db . audit_logs . createIndex (
{ "created_at" : 1 },
{ expireAfterSeconds: 7776000 } // 90 days
)
// Auto-delete document versions after 1 year
db . document_versions . createIndex (
{ "created_at" : 1 },
{ expireAfterSeconds: 31536000 } // 365 days
)
// Short-lived session data (1 hour)
db . sessions . createIndex (
{ "created_at" : 1 },
{ expireAfterSeconds: 3600 }
)
TTL Configuration:
// pkg/store/mongo.go - Creating TTL indexes
ttl := 90 * 24 * time . Hour
indexConfig := & IndexConfig {
Fields : [] string { "created_at" },
Order : [] int { 1 },
TTL : & ttl ,
Name : "ttl_audit_logs" ,
}
err := store . CreateIndex ( ctx , "audit_logs" , indexConfig )
Status-Based Retention
Archive documents based on status changes:
# Archive completed orders after 30 days
archival :
collections :
orders :
conditions :
status : "completed"
updated_at : { $lt : "30d_ago" }
target : "orders_archive"
Implementation:
// Archive query
filter := bson . M {
"status" : "completed" ,
"updated_at" : bson . M { "$lt" : time . Now (). AddDate ( 0 , 0 , - 30 )},
}
// Move to archive collection
docs , err := store . Find ( ctx , "orders" , filter , & FindOptions {})
for _ , doc := range docs {
store . Create ( ctx , "orders_archive" , doc )
store . Delete ( ctx , "orders" , doc [ "_id" ].( string ))
}
Archival Patterns
Cold Storage Archive
Move old data to separate archive collections with different indexing:
// Primary collection - heavily indexed
db . orders . createIndex ({ "company_id" : 1 , "status" : 1 })
db . orders . createIndex ({ "created_by" : 1 })
db . orders . createIndex ({ "created_at" : 1 })
// Archive collection - minimal indexes
db . orders_archive . createIndex ({ "company_id" : 1 , "archived_at" : 1 })
Benefits:
Faster queries on primary collection (smaller dataset)
Lower index overhead for writes
Cost savings (archive to cheaper storage tier)
Time-Partitioned Collections
Create monthly/yearly partitions:
orders_2025_01
orders_2025_02
orders_2025_03
...
orders_archive_2024 (consolidated)
Implementation:
// Write to current month's collection
func getCollectionName ( prefix string ) string {
return fmt . Sprintf ( " %s _ %s " , prefix , time . Now (). Format ( "2006_01" ))
}
collection := getCollectionName ( "orders" )
store . Create ( ctx , collection , doc )
// Query across partitions
func queryAllPartitions ( prefix string , filter bson . M ) [] map [ string ] interface {} {
collections := [] string { "orders_2025_01" , "orders_2025_02" , "orders_2025_03" }
var results [] map [ string ] interface {}
for _ , coll := range collections {
docs , _ := store . Find ( ctx , coll , filter , nil )
results = append ( results , docs ... )
}
return results
}
Audit Log Lifecycle
Audit logs follow a staged lifecycle:
1. Hot Storage (0-30 days) → audit_logs (full indexes)
2. Warm Storage (30-90 days) → audit_logs_warm (basic indexes)
3. Cold Storage (90+ days) → audit_logs_archive (minimal indexes)
4. Expiry (365+ days) → Auto-deleted via TTL
Cron Job for Lifecycle Management:
func archiveAuditLogs ( ctx context . Context , store Store ) error {
// Move to warm storage (30 days)
warmCutoff := time . Now (). AddDate ( 0 , 0 , - 30 )
filter := bson . M { "created_at" : bson . M { "$lt" : warmCutoff }}
moveDocs ( ctx , store , "audit_logs" , "audit_logs_warm" , filter )
// Move to cold storage (90 days)
coldCutoff := time . Now (). AddDate ( 0 , 0 , - 90 )
filter = bson . M { "created_at" : bson . M { "$lt" : coldCutoff }}
moveDocs ( ctx , store , "audit_logs_warm" , "audit_logs_archive" , filter )
return nil
}
Document Versioning Lifecycle
Document versions accumulate over time - implement retention:
Version Retention Policy
versioning :
retention :
max_versions : 50 # Keep last 50 versions
max_age_days : 365 # Delete versions older than 1 year
archive_after_days : 90 # Archive old versions
Implementation:
// Clean up old versions
func cleanupVersions ( ctx context . Context , store Store , docID string ) error {
// Keep only last 50 versions
filter := bson . M { "document_id" : docID }
opts := & FindOptions {
Sort : bson . D {{ "version" , - 1 }},
Skip : 50 , // Skip first 50 (newest)
}
oldVersions , _ := store . Find ( ctx , "document_versions" , filter , opts )
// Delete excess versions
for _ , version := range oldVersions {
store . Delete ( ctx , "document_versions" , version [ "_id" ].( string ))
}
return nil
}
Version Compaction
Periodically compact version history:
// Keep hourly snapshots for last 24h, daily for last 30d, monthly thereafter
func compactVersions ( ctx context . Context , store Store , docID string ) error {
versions , _ := store . Find ( ctx , "document_versions" ,
bson . M { "document_id" : docID },
& FindOptions { Sort : bson . D {{ "created_at" , 1 }}},
)
var toKeep [] string
now := time . Now ()
for _ , v := range versions {
created := v [ "created_at" ].( time . Time )
age := now . Sub ( created )
// Keep hourly for last 24h
if age < 24 * time . Hour {
if shouldKeepHourly ( created ) {
toKeep = append ( toKeep , v [ "_id" ].( string ))
}
}
// Keep daily for last 30d
else if age < 30 * 24 * time . Hour {
if shouldKeepDaily ( created ) {
toKeep = append ( toKeep , v [ "_id" ].( string ))
}
}
// Keep monthly thereafter
else {
if shouldKeepMonthly ( created ) {
toKeep = append ( toKeep , v [ "_id" ].( string ))
}
}
}
// Delete versions not in toKeep list
// ...
return nil
}
Horizontal Scaling
MongoDB Sharding
Scale horizontally using MongoDB sharding for large datasets:
Shard Key Selection:
// Shard by company_id (tenant isolation)
sh . enableSharding ( "permission_mongo" )
sh . shardCollection ( "permission_mongo.orders" , { "company_id" : 1 })
// Compound shard key for even distribution
sh . shardCollection ( "permission_mongo.orders" ,
{ "company_id" : 1 , "created_at" : 1 }
)
// Hashed shard key for random distribution
sh . shardCollection ( "permission_mongo.audit_logs" ,
{ "_id" : "hashed" }
)
Shard Key Guidelines:
Use company_id/tenant_id for multi-tenant isolation
Ensures tenant data stays together on same shard for query efficiency. sh . shardCollection ( "permission_mongo.documents" ,
{ "company_id" : 1 , "_id" : 1 }
)
Add time component for time-series data
Prevents hot shards by distributing writes across time ranges. sh . shardCollection ( "permission_mongo.audit_logs" ,
{ "company_id" : 1 , "created_at" : 1 }
)
Use hashed sharding for uniform distribution
Read Replicas
Scale read operations using MongoDB replicas:
// Configure read preference
opts := options . Client ().
ApplyURI ( uri ).
SetReadPreference ( readpref . SecondaryPreferred ()) // Read from replicas
store := NewMongoStoreWithOptions ( uri , database , opts )
Read Preference Options:
Preference Use Case PrimaryConsistent reads, default PrimaryPreferredReads from primary, fallback to secondary SecondaryAll reads from replicas (eventual consistency) SecondaryPreferredReads from replicas, fallback to primary NearestLowest latency (geo-distributed)
Connection Pool Scaling
Scale connection pools based on workload:
# High throughput config
mongodb :
max_pool_size : 300 # Scale with concurrent requests
min_pool_size : 50 # Keep warm connections
# Per-tenant connection pools (advanced)
tenants :
acme_corp :
mongodb :
max_pool_size : 100
startup_inc :
mongodb :
max_pool_size : 20
Data Growth Monitoring
Collection Size Tracking
// Monitor collection sizes
db . orders . stats ()
// {
// "size": 1073741824, // 1GB
// "count": 1000000,
// "avgObjSize": 1024,
// "storageSize": 536870912, // 512MB
// "totalIndexSize": 104857600 // 100MB
// }
// Track growth over time
db . collection_stats . insertOne ({
collection: "orders" ,
timestamp: new Date (),
size: db . orders . stats (). size ,
count: db . orders . stats (). count ,
})
Prometheus Metrics
Expose collection size metrics:
// Custom metric for collection sizes
var collectionSize = prometheus . NewGaugeVec (
prometheus . GaugeOpts {
Name : "permission_mongo_collection_size_bytes" ,
Help : "Size of collections in bytes" ,
},
[] string { "collection" },
)
// Update periodically
func updateCollectionMetrics ( ctx context . Context , db * mongo . Database ) {
collections := [] string { "orders" , "audit_logs" , "document_versions" }
for _ , coll := range collections {
var stats bson . M
db . RunCommand ( ctx , bson . D {{ "collStats" , coll }}). Decode ( & stats )
size := stats [ "size" ].( int64 )
collectionSize . WithLabelValues ( coll ). Set ( float64 ( size ))
}
}
Backup and Recovery
Backup Strategy
backup :
schedule : "0 2 * * *" # Daily at 2 AM
retention : 30 # Keep 30 daily backups
method : "mongodump" # or "snapshot"
collections :
- orders
- policies
- schemas
- users
# Exclude large, transient data
exclude :
- audit_logs # Use archive instead
- sessions # Recreated on login
Backup Script:
#!/bin/bash
# Backup critical collections
DATE = $( date +%Y%m%d )
BACKUP_DIR = "/backups/permission-mongo- $DATE "
mongodump \
--uri= "mongodb://localhost:27017" \
--db=permission_mongo \
--out= $BACKUP_DIR \
--excludeCollection=audit_logs \
--excludeCollection=sessions
# Compress
tar -czf " $BACKUP_DIR .tar.gz" -C /backups "permission-mongo- $DATE "
rm -rf $BACKUP_DIR
# Upload to S3
aws s3 cp " $BACKUP_DIR .tar.gz" s3://backups/permission-mongo/
# Clean old backups (keep 30 days)
find /backups -name "permission-mongo-*.tar.gz" -mtime +30 -delete
Point-in-Time Recovery
Enable MongoDB oplog for PITR:
# Replica set config
replication :
replSetName : "rs0"
oplogSizeMB : 10240 # 10GB oplog for 24h retention
Restore to specific timestamp:
# Restore to 2025-03-04 10:00:00
mongorestore \
--uri= "mongodb://localhost:27017" \
--oplogReplay \
--oplogLimit= "1709550000:0" \
/backups/permission-mongo-20250304
Index Optimization
Optimize indexes as data grows:
// Covering indexes for common queries
db . orders . createIndex (
{ "company_id" : 1 , "status" : 1 , "created_at" : 1 },
{ name: "query_active_orders" }
)
// Partial indexes for sparse data
db . orders . createIndex (
{ "approved_by" : 1 },
{
partialFilterExpression: { "status" : "approved" },
name: "approved_orders_only"
}
)
// Drop unused indexes
db . orders . dropIndex ( "old_index_name" )
Query Optimization
Optimize queries for large collections:
// Use pagination (skip+limit is slow for large offsets)
func paginateEfficiently ( ctx context . Context , store Store ) {
// Bad: skip is slow for large offsets
opts := & FindOptions { Skip : 10000 , Limit : 100 }
// Good: Use range queries on indexed field
filter := bson . M {
"_id" : bson . M { "$gt" : lastSeenID },
"company_id" : tenantID ,
}
opts = & FindOptions { Limit : 100 }
docs , _ := store . Find ( ctx , "orders" , filter , opts )
}
// Use projections to reduce data transfer
opts := & FindOptions {
Projection : bson . M {
"_id" : 1 ,
"status" : 1 ,
"created_at" : 1 ,
// Exclude large fields
"metadata" : 0 ,
"attachments" : 0 ,
},
}
Scaling Checklist
Implement TTL indexes for time-based data retention
Set up archival strategy for old/completed documents
Configure MongoDB sharding when approaching 1TB per collection
Use read replicas for read-heavy workloads
Monitor collection sizes and growth rates
Optimize indexes (covering, partial, compound)
Implement efficient pagination with range queries
Set up automated backups with 30-day retention
Test restore procedures quarterly
Archive audit logs to cold storage after 90 days
Next Steps
Performance Tuning Optimize for high throughput and low latency
Caching Strategy Reduce database load with Redis caching