Skip to main content
KubeLB exposes comprehensive Prometheus metrics for monitoring the health and performance of your load balancing infrastructure.

Metrics Overview

KubeLB components expose metrics on the following endpoints:
ComponentDefault PortEndpoint
Manager:9443/metrics
CCM:9445/metrics
Envoy Proxy:19001/stats/prometheus

Manager Metrics

Metrics exposed by the KubeLB Manager component running in the management cluster.

LoadBalancer Metrics

Track LoadBalancer resources and reconciliation performance:
# Current number of LoadBalancer resources by tenant
kubelb_manager_loadbalancers{namespace="tenant-production", tenant="production", topology="shared"}

# LoadBalancer reconciliation rate
rate(kubelb_manager_loadbalancer_reconcile_total{result="success"}[5m])

# LoadBalancer reconciliation errors
rate(kubelb_manager_loadbalancer_reconcile_total{result="error"}[5m])

# P95 reconciliation latency
histogram_quantile(0.95, rate(kubelb_manager_loadbalancer_reconcile_duration_seconds_bucket[5m]))
MetricTypeLabelsDescription
kubelb_manager_loadbalancersGaugenamespace, tenant, topologyCurrent number of LoadBalancer resources
kubelb_manager_loadbalancer_reconcile_totalCounternamespace, resultTotal LoadBalancer reconciliation attempts
kubelb_manager_loadbalancer_reconcile_duration_secondsHistogramnamespaceLoadBalancer reconciliation duration

Route Metrics

Monitor Layer 7 routing (Ingress/Gateway API) resources:
# Current number of routes by type
kubelb_manager_routes{route_type="ingress"}
kubelb_manager_routes{route_type="httproute"}

# Route reconciliation errors by type
rate(kubelb_manager_route_reconcile_total{result="error"}[5m])

# P99 route reconciliation latency
histogram_quantile(0.99, rate(kubelb_manager_route_reconcile_duration_seconds_bucket[5m]))
MetricTypeLabelsDescription
kubelb_manager_routesGaugenamespace, tenant, route_typeCurrent number of Route resources
kubelb_manager_route_reconcile_totalCounternamespace, route_type, resultTotal Route reconciliation attempts
kubelb_manager_route_reconcile_duration_secondsHistogramnamespaceRoute reconciliation duration

Tenant Metrics

Track tenant resources and reconciliation:
# Total number of tenants
kubelb_manager_tenants

# Tenant reconciliation success rate
rate(kubelb_manager_tenant_reconcile_total{result="success"}[5m])
MetricTypeLabelsDescription
kubelb_manager_tenantsGauge-Current number of Tenant resources
kubelb_manager_tenant_reconcile_totalCounterresultTotal Tenant reconciliation attempts
kubelb_manager_tenant_reconcile_duration_secondsHistogram-Tenant reconciliation duration

Envoy Control Plane Metrics

Monitor the xDS control plane for Envoy proxies:
# Current number of Envoy proxies
kubelb_manager_envoy_proxies{topology="shared"}

# Envoy snapshot resources
kubelb_manager_envoycp_clusters{snapshot_name="default"}
kubelb_manager_envoycp_listeners{snapshot_name="default"}
kubelb_manager_envoycp_endpoints{snapshot_name="default"}

# Snapshot update rate
rate(kubelb_manager_envoycp_snapshot_updates_total[5m])

# Control plane reconciliation latency
histogram_quantile(0.95, rate(kubelb_manager_envoycp_reconcile_duration_seconds_bucket[5m]))
MetricTypeLabelsDescription
kubelb_manager_envoy_proxiesGaugenamespace, topologyCurrent number of Envoy proxy deployments
kubelb_manager_envoycp_clustersGaugesnapshot_nameNumber of clusters in Envoy snapshot
kubelb_manager_envoycp_listenersGaugesnapshot_nameNumber of listeners in Envoy snapshot
kubelb_manager_envoycp_endpointsGaugesnapshot_nameNumber of endpoints in Envoy snapshot
kubelb_manager_envoycp_snapshot_updates_totalCountersnapshot_nameTotal Envoy snapshot updates
kubelb_manager_envoycp_reconcile_totalCounterresultTotal Envoy control plane reconciliations
kubelb_manager_envoycp_reconcile_duration_secondsHistogram-Envoy control plane reconciliation duration

Port Allocator Metrics

Monitor port allocation for Layer 4 load balancing:
# Allocated ports
kubelb_manager_port_allocator_allocated_ports

# Tracked endpoints
kubelb_manager_port_allocator_endpoints
MetricTypeDescription
kubelb_manager_port_allocator_allocated_portsGaugeCurrent number of allocated ports
kubelb_manager_port_allocator_endpointsGaugeCurrent number of tracked endpoints

SyncSecret Metrics

Track secret synchronization between tenant and management clusters:
# SyncSecret reconciliation rate
rate(kubelb_manager_sync_secret_reconcile_total{result="success"}[5m])
MetricTypeLabelsDescription
kubelb_manager_sync_secret_reconcile_totalCounternamespace, resultTotal SyncSecret reconciliation attempts
kubelb_manager_sync_secret_reconcile_duration_secondsHistogramnamespaceSyncSecret reconciliation duration

CCM Metrics

Metrics exposed by the KubeLB CCM component running in tenant clusters.

Service Metrics

Monitor LoadBalancer service reconciliation:
# Managed LoadBalancer services
kubelb_ccm_managed_services{namespace="default"}

# Service reconciliation rate
rate(kubelb_ccm_service_reconcile_total{result="success"}[5m])

# Service reconciliation errors
rate(kubelb_ccm_service_reconcile_total{result="error"}[5m])
MetricTypeLabelsDescription
kubelb_ccm_managed_servicesGaugenamespaceCurrent number of managed LoadBalancer services
kubelb_ccm_service_reconcile_totalCounternamespace, resultTotal Service reconciliation attempts
kubelb_ccm_service_reconcile_duration_secondsHistogramnamespaceService reconciliation duration

Ingress Metrics

Track Ingress resource management:
# Managed Ingresses
kubelb_ccm_managed_ingresses{namespace="default"}

# Ingress reconciliation rate
rate(kubelb_ccm_ingress_reconcile_total{result="success"}[5m])
MetricTypeLabelsDescription
kubelb_ccm_managed_ingressesGaugenamespaceCurrent number of managed Ingresses
kubelb_ccm_ingress_reconcile_totalCounternamespace, resultTotal Ingress reconciliation attempts
kubelb_ccm_ingress_reconcile_duration_secondsHistogramnamespaceIngress reconciliation duration

Gateway API Metrics

Monitor Gateway API resources:
# Managed Gateways
kubelb_ccm_managed_gateways{namespace="default"}

# HTTPRoute reconciliation rate
rate(kubelb_ccm_httproute_reconcile_total{result="success"}[5m])

# GRPCRoute reconciliation rate
rate(kubelb_ccm_grpcroute_reconcile_total{result="success"}[5m])
MetricTypeLabelsDescription
kubelb_ccm_managed_gatewaysGaugenamespaceCurrent number of managed Gateways
kubelb_ccm_managed_httproutesGaugenamespaceCurrent number of managed HTTPRoutes
kubelb_ccm_managed_grpcroutesGaugenamespaceCurrent number of managed GRPCRoutes
kubelb_ccm_gateway_reconcile_totalCounternamespace, resultTotal Gateway reconciliation attempts
kubelb_ccm_httproute_reconcile_totalCounternamespace, resultTotal HTTPRoute reconciliation attempts
kubelb_ccm_grpcroute_reconcile_totalCounternamespace, resultTotal GRPCRoute reconciliation attempts
kubelb_ccm_gateway_reconcile_duration_secondsHistogramnamespaceGateway reconciliation duration
kubelb_ccm_httproute_reconcile_duration_secondsHistogramnamespaceHTTPRoute reconciliation duration
kubelb_ccm_grpcroute_reconcile_duration_secondsHistogramnamespaceGRPCRoute reconciliation duration

Node Metrics

Track node endpoints for load balancing:
# Total nodes in tenant cluster
kubelb_ccm_nodes

# Node reconciliation rate
rate(kubelb_ccm_node_reconcile_total{result="success"}[5m])
MetricTypeLabelsDescription
kubelb_ccm_nodesGauge-Current number of nodes in the cluster
kubelb_ccm_node_reconcile_totalCounterresultTotal Node reconciliation attempts
kubelb_ccm_node_reconcile_duration_secondsHistogram-Node reconciliation duration

KubeLB Cluster Connection Metrics

Monitor connectivity between CCM and the management cluster:
# KubeLB cluster connection status (1=connected, 0=disconnected)
kubelb_ccm_kubelb_cluster_connected

# Operations to KubeLB cluster
rate(kubelb_ccm_kubelb_cluster_operations_total[5m])

# P95 operation latency
histogram_quantile(0.95, rate(kubelb_ccm_kubelb_cluster_latency_seconds_bucket[5m]))
MetricTypeLabelsDescription
kubelb_ccm_kubelb_cluster_connectedGauge-Connection status (1=connected, 0=disconnected)
kubelb_ccm_kubelb_cluster_operations_totalCounteroperation, resultTotal operations to management cluster
kubelb_ccm_kubelb_cluster_latency_secondsHistogramoperationOperation latency to management cluster

SyncSecret Metrics

# SyncSecret reconciliation in tenant cluster
rate(kubelb_ccm_sync_secret_reconcile_total{result="success"}[5m])
MetricTypeLabelsDescription
kubelb_ccm_sync_secret_reconcile_totalCounternamespace, resultTotal SyncSecret reconciliation attempts
kubelb_ccm_sync_secret_reconcile_duration_secondsHistogramnamespaceSyncSecret reconciliation duration

EnvoyCP Metrics

Metrics exposed by the Envoy Control Plane component (embedded in Manager).

Cache Metrics

Monitor xDS cache performance:
# Cache hit rate
rate(kubelb_envoy_control_plane_cache_hits_total[5m]) / 
  (rate(kubelb_envoy_control_plane_cache_hits_total[5m]) + 
   rate(kubelb_envoy_control_plane_cache_misses_total[5m]))

# Cache clears
rate(kubelb_envoy_control_plane_cache_clears_total[5m])
MetricTypeLabelsDescription
kubelb_envoy_control_plane_cache_hits_totalCountersnapshot_nameTotal cache hits for snapshot lookups
kubelb_envoy_control_plane_cache_misses_totalCountersnapshot_nameTotal cache misses for snapshot lookups
kubelb_envoy_control_plane_cache_clears_totalCountersnapshot_nameTotal cache clears

gRPC Metrics

Monitor xDS gRPC connections:
# Active gRPC connections from Envoy proxies
kubelb_envoy_control_plane_grpc_connections

# xDS request rate by resource type
rate(kubelb_envoy_control_plane_grpc_requests_total[5m])

# xDS response rate by resource type
rate(kubelb_envoy_control_plane_grpc_responses_total[5m])
MetricTypeLabelsDescription
kubelb_envoy_control_plane_grpc_connectionsGauge-Current number of active gRPC connections
kubelb_envoy_control_plane_grpc_requests_totalCountertype_urlTotal gRPC xDS requests
kubelb_envoy_control_plane_grpc_responses_totalCountertype_urlTotal gRPC xDS responses

Snapshot Metrics

# Snapshot generation latency
histogram_quantile(0.95, rate(kubelb_envoy_control_plane_snapshot_generation_duration_seconds_bucket[5m]))

# Snapshot resources
kubelb_envoy_control_plane_clusters{snapshot_name="default"}
kubelb_envoy_control_plane_listeners{snapshot_name="default"}
kubelb_envoy_control_plane_routes{snapshot_name="default"}
kubelb_envoy_control_plane_secrets{snapshot_name="default"}
MetricTypeLabelsDescription
kubelb_envoy_control_plane_snapshotsGauge-Current number of active Envoy snapshots
kubelb_envoy_control_plane_snapshot_updates_totalCountersnapshot_nameTotal Envoy snapshot updates
kubelb_envoy_control_plane_snapshot_generation_duration_secondsHistogramsnapshot_nameSnapshot generation duration
kubelb_envoy_control_plane_clustersGaugesnapshot_nameNumber of clusters in snapshot
kubelb_envoy_control_plane_listenersGaugesnapshot_nameNumber of listeners in snapshot
kubelb_envoy_control_plane_routesGaugesnapshot_nameNumber of routes in snapshot
kubelb_envoy_control_plane_secretsGaugesnapshot_nameNumber of secrets in snapshot

Label Reference

Common labels used across KubeLB metrics:
LabelDescriptionExample Values
namespaceKubernetes namespacetenant-production, kubelb
tenantKubeLB tenant identifierproduction, staging
resultReconciliation resultsuccess, error, skipped
route_typeType of route resourceingress, gateway, httproute, grpcroute
topologyEnvoy proxy topologyshared
operationOperation typecreate, update, delete
snapshot_nameEnvoy xDS snapshot IDdefault, tenant-production
type_urlEnvoy xDS resource typetype.googleapis.com/envoy.config.cluster.v3.Cluster

Prometheus Configuration

ServiceMonitor (Prometheus Operator)

Create ServiceMonitors to automatically discover KubeLB metrics endpoints:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubelb-manager
  namespace: kubelb
spec:
  selector:
    matchLabels:
      app: kubelb-manager
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

Manual Prometheus Scrape Configuration

If not using Prometheus Operator, add scrape configs manually:
prometheus.yml
scrape_configs:
  - job_name: 'kubelb-manager'
    kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
            - kubelb
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_app]
        regex: kubelb-manager
        action: keep

  - job_name: 'kubelb-ccm'
    kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
            - kube-system
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_app]
        regex: kubelb-ccm
        action: keep

Alerting Rules

groups:
  - name: kubelb_alerts
    rules:
      - alert: KubeLBHighErrorRate
        expr: |
          sum(rate(kubelb_manager_loadbalancer_reconcile_total{result="error"}[5m])) 
          / sum(rate(kubelb_manager_loadbalancer_reconcile_total[5m])) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "KubeLB Manager has high error rate"
          description: "More than 10% of LoadBalancer reconciliations are failing"

Grafana Dashboards

Dashboard Templates

KubeLB provides pre-built Grafana dashboard templates. Import these dashboards using the Grafana UI or provision them via ConfigMaps.
1

Manager Dashboard

Monitors Manager metrics including LoadBalancer reconciliation, Envoy control plane, and port allocation.Key panels:
  • LoadBalancer count by tenant
  • Reconciliation success rate
  • Envoy proxy count and snapshot resources
  • P95/P99 reconciliation latency
2

CCM Dashboard

Tracks CCM metrics including service reconciliation, node endpoints, and management cluster connectivity.Key panels:
  • Managed services count
  • CCM connection status
  • Reconciliation error rate
  • Operation latency to management cluster
3

Envoy Control Plane Dashboard

Visualizes xDS control plane metrics including cache performance and gRPC connections.Key panels:
  • Active gRPC connections
  • xDS request/response rate
  • Cache hit rate
  • Snapshot generation time

Example Dashboard Queries

sum(rate(kubelb_manager_loadbalancer_reconcile_total{result="success"}[5m])) 
/ sum(rate(kubelb_manager_loadbalancer_reconcile_total[5m]))
topk(10, sum by (namespace) (kubelb_manager_loadbalancers))
sum by (pod) (container_memory_working_set_bytes{pod=~"kubelb-envoy-.*"})
histogram_quantile(0.95, 
  sum(rate(kubelb_ccm_kubelb_cluster_latency_seconds_bucket[5m])) by (le)
)

Health Checks

Both Manager and CCM expose health check endpoints:
# Manager health checks
curl http://localhost:8081/healthz
curl http://localhost:8081/readyz

# CCM health checks
curl http://localhost:8081/healthz
curl http://localhost:8081/readyz
These endpoints return HTTP 200 when healthy and are used by Kubernetes liveness and readiness probes.

Next Steps

Troubleshooting

Debug issues using logs and metrics

Configuration

Tune KubeLB configuration for your environment

Build docs developers (and LLMs) love