Metrics Overview
Pulsar exposes metrics in Prometheus format, making it easy to integrate with popular monitoring and visualization tools.Metrics Endpoints
Brokers expose metrics at the following HTTP endpoints:http://broker:8080/metrics/- All metrics in Prometheus formathttp://broker:8080/metrics?cluster=<cluster-name>- Filtered by cluster
Metric Types
Pulsar tracks several categories of metrics:- Broker metrics - Resource usage, message rates, connections
- Topic metrics - Per-topic message rates, storage, subscriptions
- Namespace metrics - Aggregated metrics at namespace level
- Subscription metrics - Consumer lag, backlog, acknowledgment rates
- Replication metrics - Cross-cluster replication statistics
- Storage metrics - BookKeeper and tiered storage performance
Key Metrics to Monitor
Broker Health Metrics
CPU and Memory
Connection Metrics
Message Rate Metrics
Publish Rates
Consumption Rates
Storage Metrics
BookKeeper Performance
Storage Size
Subscription Metrics
Consumer Lag
Message Acknowledgment
Replication Metrics
Monitoring Tools Integration
Prometheus
Configure Prometheus to scrape Pulsar metrics:Grafana Dashboards
Pulsar provides pre-built Grafana dashboards for visualization:- Cluster Overview - High-level cluster health and performance
- Broker Metrics - Detailed broker-level statistics
- Topic Metrics - Per-topic message rates and storage
- Namespace Metrics - Namespace-level aggregations
Health Checks
Broker Health Endpoint
Check if a broker is healthy:ok if the broker is healthy.
Topic Stats
Get detailed statistics for a topic:- Message rates (in/out)
- Storage size
- Subscription details
- Publisher and consumer information
Subscription Stats
Monitor subscription lag and backlog:Alerting Guidelines
Critical Alerts
Set up alerts for these conditions:Broker Down
High Memory Usage
Message Backlog Growing
Replication Lag High
Warning Alerts
High CPU Usage
Slow Storage Operations
Connection Limit Approaching
Log Monitoring
Log Locations
Pulsar logs are stored in:logs/pulsar-broker-*.log- Broker application logslogs/pulsar-gc.log- Garbage collection logs
Important Log Patterns
Monitor logs for these patterns:OutOfMemoryError- Memory exhaustionFailed to acquire- Resource acquisition failuresTimeout- Operation timeoutsConnection refused- Connectivity issuesMetadata store operation failed- ZooKeeper/metadata issues
Performance Benchmarking
Use the built-in performance testing tool:Monitoring Configuration
Enable Metrics Collection
Metrics are enabled by default. Configure collection intervals:Interval for checking compaction status.
Minimum interval to update load reports.
Managed ledger Prometheus stats latency rollover interval.
Best Practices
- Set up comprehensive monitoring - Monitor all layers: brokers, BookKeeper, ZooKeeper
- Configure alerting - Set up alerts for critical conditions before issues occur
- Track trends - Monitor long-term trends in message rates, storage, and latency
- Capacity planning - Use metrics to plan cluster expansion
- Custom metrics - Add application-specific metrics for end-to-end monitoring
- Dashboard visibility - Create role-specific dashboards for operators and developers
- Regular reviews - Periodically review and tune alert thresholds