Overview
Permission Mongo exposes comprehensive Prometheus metrics at the /metrics endpoint. All metrics use the permission_mongo namespace to avoid conflicts.
HTTP Metrics
Track HTTP request behavior, latency, and throughput.
permission_mongo_http_requests_total
Type: Counter
Labels: method, path, status
Total number of HTTP requests by method, path, and status code.
# Request rate per second
rate(permission_mongo_http_requests_total[1m])
# Request rate by method
sum by (method) (rate(permission_mongo_http_requests_total[1m]))
# Error rate (5xx responses)
sum(rate(permission_mongo_http_requests_total{status=~"5.."}[5m]))
/ sum(rate(permission_mongo_http_requests_total[5m]))
# Requests by endpoint
sum by (path) (rate(permission_mongo_http_requests_total[1m]))
permission_mongo_http_request_duration_seconds
Type: Histogram
Labels: method, path
Buckets: .001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10
HTTP request latency in seconds.
# P95 latency across all endpoints
histogram_quantile(0.95,
sum(rate(permission_mongo_http_request_duration_seconds_bucket[5m])) by (le)
)
# P50, P90, P99 latencies
histogram_quantile(0.50, sum(rate(permission_mongo_http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.90, sum(rate(permission_mongo_http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.99, sum(rate(permission_mongo_http_request_duration_seconds_bucket[5m])) by (le))
# Latency by endpoint
histogram_quantile(0.95,
sum(rate(permission_mongo_http_request_duration_seconds_bucket[5m])) by (le, path)
)
permission_mongo_http_request_size_bytes
Type: Histogram
Labels: method, path
Buckets: Exponential from 100B to 100MB
HTTP request body size in bytes.
# Average request size
rate(permission_mongo_http_request_size_bytes_sum[5m])
/ rate(permission_mongo_http_request_size_bytes_count[5m])
# P95 request size
histogram_quantile(0.95,
sum(rate(permission_mongo_http_request_size_bytes_bucket[5m])) by (le)
)
permission_mongo_http_response_size_bytes
Type: Histogram
Labels: method, path
Buckets: Exponential from 100B to 100MB
HTTP response body size in bytes.
# Average response size
rate(permission_mongo_http_response_size_bytes_sum[5m])
/ rate(permission_mongo_http_response_size_bytes_count[5m])
# Total bandwidth (bytes/sec)
rate(permission_mongo_http_response_size_bytes_sum[1m])
permission_mongo_http_active_requests
Type: Gauge
Number of currently active HTTP requests.
# Current active requests
permission_mongo_http_active_requests
# Peak active requests over last hour
max_over_time(permission_mongo_http_active_requests[1h])
MongoDB Metrics
Monitor database operations and connection pool health.
permission_mongo_mongo_operations_total
Type: Counter
Labels: collection, operation
Total number of MongoDB operations by collection and operation type (find, insert, update, delete).
# Operations per second by type
sum by (operation) (rate(permission_mongo_mongo_operations_total[1m]))
# Operations by collection
sum by (collection) (rate(permission_mongo_mongo_operations_total[1m]))
# Write operations rate
sum(rate(permission_mongo_mongo_operations_total{operation=~"insert|update|delete"}[1m]))
permission_mongo_mongo_operation_duration_seconds
Type: Histogram
Labels: collection, operation
Buckets: .001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5
MongoDB operation latency in seconds.
# P95 latency by operation type
histogram_quantile(0.95,
sum(rate(permission_mongo_mongo_operation_duration_seconds_bucket[5m])) by (le, operation)
)
# Average query latency
rate(permission_mongo_mongo_operation_duration_seconds_sum[5m])
/ rate(permission_mongo_mongo_operation_duration_seconds_count[5m])
permission_mongo_mongo_errors_total
Type: Counter
Labels: collection, operation
Total number of MongoDB errors by collection and operation.
# Error rate per second
sum(rate(permission_mongo_mongo_errors_total[5m]))
# Errors by collection
sum by (collection) (rate(permission_mongo_mongo_errors_total[1m]))
permission_mongo_mongo_pool_size
Type: Gauge
Current MongoDB connection pool size.
# Current pool size
permission_mongo_mongo_pool_size
# Pool size over time
permission_mongo_mongo_pool_size
Cache Metrics
Track cache effectiveness and performance.
permission_mongo_cache_hits_total
Type: Counter
Labels: type
Total number of cache hits by cache type (policy, hierarchy, schema).
# Cache hit rate
sum(rate(permission_mongo_cache_hits_total[5m]))
/ (sum(rate(permission_mongo_cache_hits_total[5m])) + sum(rate(permission_mongo_cache_misses_total[5m])))
# Hit rate by cache type
sum by (type) (rate(permission_mongo_cache_hits_total[5m]))
/ (sum by (type) (rate(permission_mongo_cache_hits_total[5m])) + sum by (type) (rate(permission_mongo_cache_misses_total[5m])))
# Hits per second
sum(rate(permission_mongo_cache_hits_total[1m]))
permission_mongo_cache_misses_total
Type: Counter
Labels: type
Total number of cache misses by cache type.
# Miss rate
sum(rate(permission_mongo_cache_misses_total[5m]))
/ (sum(rate(permission_mongo_cache_hits_total[5m])) + sum(rate(permission_mongo_cache_misses_total[5m])))
# Misses per second by type
sum by (type) (rate(permission_mongo_cache_misses_total[1m]))
permission_mongo_cache_operation_duration_seconds
Type: Histogram
Labels: operation
Buckets: .0001, .0005, .001, .005, .01, .025, .05, .1
Cache operation latency in seconds.
# P95 cache operation latency
histogram_quantile(0.95,
sum(rate(permission_mongo_cache_operation_duration_seconds_bucket[5m])) by (le)
)
# Average cache operation time
rate(permission_mongo_cache_operation_duration_seconds_sum[5m])
/ rate(permission_mongo_cache_operation_duration_seconds_count[5m])
RBAC Metrics
Monitor RBAC policy evaluation performance.
permission_mongo_rbac_evaluations_total
Type: Counter
Labels: action, result
Total number of RBAC policy evaluations by result (allowed, denied).
# Evaluations per second
sum(rate(permission_mongo_rbac_evaluations_total[1m]))
# Allow vs deny rate
sum by (result) (rate(permission_mongo_rbac_evaluations_total[1m]))
# Deny percentage
sum(rate(permission_mongo_rbac_evaluations_total{result="denied"}[5m]))
/ sum(rate(permission_mongo_rbac_evaluations_total[5m]))
permission_mongo_rbac_evaluation_duration_seconds
Type: Histogram
Buckets: .0001, .0005, .001, .005, .01, .025, .05
RBAC policy evaluation latency in seconds.
# P95 evaluation latency
histogram_quantile(0.95,
sum(rate(permission_mongo_rbac_evaluation_duration_seconds_bucket[5m])) by (le)
)
# Average evaluation time
rate(permission_mongo_rbac_evaluation_duration_seconds_sum[5m])
/ rate(permission_mongo_rbac_evaluation_duration_seconds_count[5m])
permission_mongo_rbac_cache_size
Type: Gauge
Number of cached RBAC AST expressions.
# Current cache size
permission_mongo_rbac_cache_size
# Cache growth rate
deriv(permission_mongo_rbac_cache_size[5m])
Audit Metrics
Track audit logging behavior and queue health.
permission_mongo_audit_logs_total
Type: Counter
Labels: action, success
Total number of audit log entries by action and success status.
# Audit logs per second
sum(rate(permission_mongo_audit_logs_total[1m]))
# Logs by action
sum by (action) (rate(permission_mongo_audit_logs_total[1m]))
# Failed actions rate
sum(rate(permission_mongo_audit_logs_total{success="false"}[1m]))
permission_mongo_audit_logs_dropped_total
Type: Counter
Total number of audit logs dropped due to full buffer.
# Dropped logs per second
rate(permission_mongo_audit_logs_dropped_total[1m])
# Total dropped logs
permission_mongo_audit_logs_dropped_total
Any dropped audit logs indicate the queue is overflowing. This is a critical issue that requires immediate attention.
permission_mongo_audit_queue_size
Type: Gauge
Current number of audit logs in the async queue.
# Current queue depth
permission_mongo_audit_queue_size
# Average queue depth
avg_over_time(permission_mongo_audit_queue_size[5m])
permission_mongo_audit_batch_size
Type: Histogram
Buckets: 1, 5, 10, 25, 50, 100
Size of audit log batches written to MongoDB.
# Average batch size
rate(permission_mongo_audit_batch_size_sum[5m])
/ rate(permission_mongo_audit_batch_size_count[5m])
# P95 batch size
histogram_quantile(0.95,
sum(rate(permission_mongo_audit_batch_size_bucket[5m])) by (le)
)
Server Metrics
Track overall server health and resource usage.
permission_mongo_server_info
Type: Gauge
Labels: version, go_version
Server build information (always 1).
# Server info
permission_mongo_server_info
permission_mongo_server_uptime_seconds
Type: Gauge
Server uptime in seconds.
# Uptime in hours
permission_mongo_server_uptime_seconds / 3600
# Uptime in days
permission_mongo_server_uptime_seconds / 86400
permission_mongo_server_goroutines
Type: Gauge
Number of active goroutines.
# Current goroutine count
permission_mongo_server_goroutines
# Goroutine growth rate
deriv(permission_mongo_server_goroutines[5m])
Goroutine counts above 10,000 may indicate goroutine leaks.
Connection Pool Metrics
Monitor Redis connection pool health.
permission_mongo_redis_pool_size
Type: Gauge
Current Redis connection pool size.
permission_mongo_redis_pool_idle_connections
Type: Gauge
Current Redis idle connection count.
# Pool utilization
(permission_mongo_redis_pool_size - permission_mongo_redis_pool_idle_connections)
/ permission_mongo_redis_pool_size
Adding Custom Metrics
To add new metrics to your deployment:
Define the metric
Add your metric definition in pkg/metrics/metrics.go:var MyCustomMetric = promauto.NewCounter(
prometheus.CounterOpts{
Namespace: namespace,
Name: "my_custom_metric_total",
Help: "Description of my metric",
},
)
Instrument your code
Use the metric in your application code:metrics.MyCustomMetric.Inc()
Verify exposure
Check that the metric appears at /metrics endpoint
Avoid high-cardinality labels (user IDs, request IDs, timestamps) to prevent Prometheus performance issues.