Skip to main content

Overview

Permission Mongo exposes comprehensive Prometheus metrics at the /metrics endpoint. All metrics use the permission_mongo namespace to avoid conflicts.

HTTP Metrics

Track HTTP request behavior, latency, and throughput.

permission_mongo_http_requests_total

Type: Counter
Labels: method, path, status
Total number of HTTP requests by method, path, and status code.
Example Queries
# Request rate per second
rate(permission_mongo_http_requests_total[1m])

# Request rate by method
sum by (method) (rate(permission_mongo_http_requests_total[1m]))

# Error rate (5xx responses)
sum(rate(permission_mongo_http_requests_total{status=~"5.."}[5m])) 
  / sum(rate(permission_mongo_http_requests_total[5m]))

# Requests by endpoint
sum by (path) (rate(permission_mongo_http_requests_total[1m]))

permission_mongo_http_request_duration_seconds

Type: Histogram
Labels: method, path
Buckets: .001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10
HTTP request latency in seconds.
Example Queries
# P95 latency across all endpoints
histogram_quantile(0.95, 
  sum(rate(permission_mongo_http_request_duration_seconds_bucket[5m])) by (le)
)

# P50, P90, P99 latencies
histogram_quantile(0.50, sum(rate(permission_mongo_http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.90, sum(rate(permission_mongo_http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.99, sum(rate(permission_mongo_http_request_duration_seconds_bucket[5m])) by (le))

# Latency by endpoint
histogram_quantile(0.95, 
  sum(rate(permission_mongo_http_request_duration_seconds_bucket[5m])) by (le, path)
)

permission_mongo_http_request_size_bytes

Type: Histogram
Labels: method, path
Buckets: Exponential from 100B to 100MB
HTTP request body size in bytes.
Example Queries
# Average request size
rate(permission_mongo_http_request_size_bytes_sum[5m]) 
  / rate(permission_mongo_http_request_size_bytes_count[5m])

# P95 request size
histogram_quantile(0.95, 
  sum(rate(permission_mongo_http_request_size_bytes_bucket[5m])) by (le)
)

permission_mongo_http_response_size_bytes

Type: Histogram
Labels: method, path
Buckets: Exponential from 100B to 100MB
HTTP response body size in bytes.
Example Queries
# Average response size
rate(permission_mongo_http_response_size_bytes_sum[5m]) 
  / rate(permission_mongo_http_response_size_bytes_count[5m])

# Total bandwidth (bytes/sec)
rate(permission_mongo_http_response_size_bytes_sum[1m])

permission_mongo_http_active_requests

Type: Gauge Number of currently active HTTP requests.
Example Queries
# Current active requests
permission_mongo_http_active_requests

# Peak active requests over last hour
max_over_time(permission_mongo_http_active_requests[1h])

MongoDB Metrics

Monitor database operations and connection pool health.

permission_mongo_mongo_operations_total

Type: Counter
Labels: collection, operation
Total number of MongoDB operations by collection and operation type (find, insert, update, delete).
Example Queries
# Operations per second by type
sum by (operation) (rate(permission_mongo_mongo_operations_total[1m]))

# Operations by collection
sum by (collection) (rate(permission_mongo_mongo_operations_total[1m]))

# Write operations rate
sum(rate(permission_mongo_mongo_operations_total{operation=~"insert|update|delete"}[1m]))

permission_mongo_mongo_operation_duration_seconds

Type: Histogram
Labels: collection, operation
Buckets: .001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5
MongoDB operation latency in seconds.
Example Queries
# P95 latency by operation type
histogram_quantile(0.95, 
  sum(rate(permission_mongo_mongo_operation_duration_seconds_bucket[5m])) by (le, operation)
)

# Average query latency
rate(permission_mongo_mongo_operation_duration_seconds_sum[5m]) 
  / rate(permission_mongo_mongo_operation_duration_seconds_count[5m])

permission_mongo_mongo_errors_total

Type: Counter
Labels: collection, operation
Total number of MongoDB errors by collection and operation.
Example Queries
# Error rate per second
sum(rate(permission_mongo_mongo_errors_total[5m]))

# Errors by collection
sum by (collection) (rate(permission_mongo_mongo_errors_total[1m]))

permission_mongo_mongo_pool_size

Type: Gauge Current MongoDB connection pool size.
Example Queries
# Current pool size
permission_mongo_mongo_pool_size

# Pool size over time
permission_mongo_mongo_pool_size

Cache Metrics

Track cache effectiveness and performance.

permission_mongo_cache_hits_total

Type: Counter
Labels: type
Total number of cache hits by cache type (policy, hierarchy, schema).
Example Queries
# Cache hit rate
sum(rate(permission_mongo_cache_hits_total[5m])) 
  / (sum(rate(permission_mongo_cache_hits_total[5m])) + sum(rate(permission_mongo_cache_misses_total[5m])))

# Hit rate by cache type
sum by (type) (rate(permission_mongo_cache_hits_total[5m])) 
  / (sum by (type) (rate(permission_mongo_cache_hits_total[5m])) + sum by (type) (rate(permission_mongo_cache_misses_total[5m])))

# Hits per second
sum(rate(permission_mongo_cache_hits_total[1m]))

permission_mongo_cache_misses_total

Type: Counter
Labels: type
Total number of cache misses by cache type.
Example Queries
# Miss rate
sum(rate(permission_mongo_cache_misses_total[5m])) 
  / (sum(rate(permission_mongo_cache_hits_total[5m])) + sum(rate(permission_mongo_cache_misses_total[5m])))

# Misses per second by type
sum by (type) (rate(permission_mongo_cache_misses_total[1m]))

permission_mongo_cache_operation_duration_seconds

Type: Histogram
Labels: operation
Buckets: .0001, .0005, .001, .005, .01, .025, .05, .1
Cache operation latency in seconds.
Example Queries
# P95 cache operation latency
histogram_quantile(0.95, 
  sum(rate(permission_mongo_cache_operation_duration_seconds_bucket[5m])) by (le)
)

# Average cache operation time
rate(permission_mongo_cache_operation_duration_seconds_sum[5m]) 
  / rate(permission_mongo_cache_operation_duration_seconds_count[5m])

RBAC Metrics

Monitor RBAC policy evaluation performance.

permission_mongo_rbac_evaluations_total

Type: Counter
Labels: action, result
Total number of RBAC policy evaluations by result (allowed, denied).
Example Queries
# Evaluations per second
sum(rate(permission_mongo_rbac_evaluations_total[1m]))

# Allow vs deny rate
sum by (result) (rate(permission_mongo_rbac_evaluations_total[1m]))

# Deny percentage
sum(rate(permission_mongo_rbac_evaluations_total{result="denied"}[5m])) 
  / sum(rate(permission_mongo_rbac_evaluations_total[5m]))

permission_mongo_rbac_evaluation_duration_seconds

Type: Histogram
Buckets: .0001, .0005, .001, .005, .01, .025, .05
RBAC policy evaluation latency in seconds.
Example Queries
# P95 evaluation latency
histogram_quantile(0.95, 
  sum(rate(permission_mongo_rbac_evaluation_duration_seconds_bucket[5m])) by (le)
)

# Average evaluation time
rate(permission_mongo_rbac_evaluation_duration_seconds_sum[5m]) 
  / rate(permission_mongo_rbac_evaluation_duration_seconds_count[5m])

permission_mongo_rbac_cache_size

Type: Gauge Number of cached RBAC AST expressions.
Example Queries
# Current cache size
permission_mongo_rbac_cache_size

# Cache growth rate
deriv(permission_mongo_rbac_cache_size[5m])

Audit Metrics

Track audit logging behavior and queue health.

permission_mongo_audit_logs_total

Type: Counter
Labels: action, success
Total number of audit log entries by action and success status.
Example Queries
# Audit logs per second
sum(rate(permission_mongo_audit_logs_total[1m]))

# Logs by action
sum by (action) (rate(permission_mongo_audit_logs_total[1m]))

# Failed actions rate
sum(rate(permission_mongo_audit_logs_total{success="false"}[1m]))

permission_mongo_audit_logs_dropped_total

Type: Counter Total number of audit logs dropped due to full buffer.
Example Queries
# Dropped logs per second
rate(permission_mongo_audit_logs_dropped_total[1m])

# Total dropped logs
permission_mongo_audit_logs_dropped_total
Any dropped audit logs indicate the queue is overflowing. This is a critical issue that requires immediate attention.

permission_mongo_audit_queue_size

Type: Gauge Current number of audit logs in the async queue.
Example Queries
# Current queue depth
permission_mongo_audit_queue_size

# Average queue depth
avg_over_time(permission_mongo_audit_queue_size[5m])

permission_mongo_audit_batch_size

Type: Histogram
Buckets: 1, 5, 10, 25, 50, 100
Size of audit log batches written to MongoDB.
Example Queries
# Average batch size
rate(permission_mongo_audit_batch_size_sum[5m]) 
  / rate(permission_mongo_audit_batch_size_count[5m])

# P95 batch size
histogram_quantile(0.95, 
  sum(rate(permission_mongo_audit_batch_size_bucket[5m])) by (le)
)

Server Metrics

Track overall server health and resource usage.

permission_mongo_server_info

Type: Gauge
Labels: version, go_version
Server build information (always 1).
Example Queries
# Server info
permission_mongo_server_info

permission_mongo_server_uptime_seconds

Type: Gauge Server uptime in seconds.
Example Queries
# Uptime in hours
permission_mongo_server_uptime_seconds / 3600

# Uptime in days
permission_mongo_server_uptime_seconds / 86400

permission_mongo_server_goroutines

Type: Gauge Number of active goroutines.
Example Queries
# Current goroutine count
permission_mongo_server_goroutines

# Goroutine growth rate
deriv(permission_mongo_server_goroutines[5m])
Goroutine counts above 10,000 may indicate goroutine leaks.

Connection Pool Metrics

Monitor Redis connection pool health.

permission_mongo_redis_pool_size

Type: Gauge Current Redis connection pool size.

permission_mongo_redis_pool_idle_connections

Type: Gauge Current Redis idle connection count.
Example Queries
# Pool utilization
(permission_mongo_redis_pool_size - permission_mongo_redis_pool_idle_connections) 
  / permission_mongo_redis_pool_size

Adding Custom Metrics

To add new metrics to your deployment:
1

Define the metric

Add your metric definition in pkg/metrics/metrics.go:
pkg/metrics/metrics.go
var MyCustomMetric = promauto.NewCounter(
    prometheus.CounterOpts{
        Namespace: namespace,
        Name:      "my_custom_metric_total",
        Help:      "Description of my metric",
    },
)
2

Instrument your code

Use the metric in your application code:
metrics.MyCustomMetric.Inc()
3

Verify exposure

Check that the metric appears at /metrics endpoint
Avoid high-cardinality labels (user IDs, request IDs, timestamps) to prevent Prometheus performance issues.

Build docs developers (and LLMs) love