Monitoring & Observability
Comprehensive monitoring solutions for tracking service health, performance metrics, logs, and uptime.Available Services
Grafana
Port: 3150 | Memory: 256 MB | Maturity: StableOpen-source analytics and interactive visualization platform for metrics, logs, and traces with rich dashboards and alerting.Features:
- Beautiful dashboards
- Multiple data sources
- Alert management
- Template variables
- Plugin ecosystem
- Team collaboration
- Environment:
GRAFANA_HOST,GRAFANA_PORT
Prometheus
Port: 9090 | Memory: 256 MB | Maturity: StableOpen-source systems monitoring and alerting toolkit for collecting and querying time-series metrics with PromQL.Features:
- Time-series database
- Powerful query language (PromQL)
- Service discovery
- Alertmanager integration
- Pull-based metrics
- Exporters ecosystem
Uptime Kuma
Port: 3001 | Memory: 256 MB | Maturity: StableSelf-hosted monitoring tool for tracking uptime of websites, APIs, and services with a sleek dashboard and multi-notification support.Features:
- HTTP/HTTPS monitoring
- TCP/Ping checks
- Status pages
- Multi-language support
- 90+ notification channels
- Certificate monitoring
Loki
Port: 3100 | Memory: 512 MB | Maturity: StableLike Prometheus, but for logs. A highly available, multi-tenant log aggregation system.Features:
- Log aggregation
- Label-based indexing
- LogQL query language
- Grafana integration
- Multi-tenancy
- Cost-effective storage
SigNoz
Port: 3301 | Memory: 1024 MB | Maturity: StableOpen-source observability platform serving as a lighter alternative to DataDog. All-in-one metrics, traces, and logs.Features:
- Metrics, traces, logs
- APM capabilities
- Service maps
- OpenTelemetry native
- Query builder
- Alerting
Usage Examples
DevOps Monitoring Stack
Full Observability Stack
Lightweight Monitoring
Complete Observability Platform
Monitoring Stack Comparison
| Service | Metrics | Logs | Traces | Alerts | Memory |
|---|---|---|---|---|---|
| Grafana | ✅ (via datasources) | ✅ | ✅ | ✅ | 256 MB |
| Prometheus | ✅ | ❌ | ❌ | ✅ | 256 MB |
| Uptime Kuma | ✅ (uptime) | ❌ | ❌ | ✅ | 256 MB |
| Loki | ❌ | ✅ | ❌ | ❌ | 512 MB |
| SigNoz | ✅ | ✅ | ✅ | ✅ | 1024 MB |
Architecture Patterns
Classic Stack (Grafana + Prometheus + Loki)
All-in-One (SigNoz)
Uptime Monitoring
Grafana Configuration
Add Prometheus Data Source
- Navigate to Configuration → Data Sources
- Add Prometheus
- URL:
http://prometheus:9090 - Save & Test
Import Dashboards
Create Dashboard
- Create → Dashboard
- Add Panel
- Write PromQL query
- Configure visualization
- Save dashboard
Prometheus Configuration
Basic prometheus.yml
Common PromQL Queries
Uptime Kuma Configuration
Add Monitors
- Add New Monitor
- Choose monitor type:
- HTTP(s)
- TCP Port
- Ping
- DNS
- Docker Container
- Configure settings
- Set notification channels
Notification Channels
- Email (SMTP)
- Slack
- Discord
- Telegram
- Webhooks
- PagerDuty
- 90+ other services
Status Pages
- Status Pages → Add Status Page
- Select monitors to include
- Customize appearance
- Share public URL
Loki Configuration
Send Logs to Loki
LogQL Queries
SigNoz Features
Application Monitoring
- Service latency
- Request rate
- Error rate
- Service map
- Database calls
Infrastructure Monitoring
- CPU, memory, disk
- Network metrics
- Container stats
- Host metrics
Log Management
- Structured logs
- Full-text search
- Log aggregation
- Correlation with traces
Alert Configuration
Grafana Alerts
- Edit Panel → Alert
- Define condition
- Set evaluation interval
- Configure notification channel
- Save
Prometheus Alerts
Uptime Kuma Alerts
- Configure in monitor settings
- Set notification channels
- Define alert conditions:
- Down
- Slow response time
- Certificate expiring
Best Practices
Metrics Collection
- Naming: Use consistent metric naming conventions
- Labels: Add relevant labels for filtering
- Cardinality: Avoid high-cardinality labels
- Retention: Configure appropriate retention periods
- Sampling: Use appropriate scrape intervals
Dashboard Design
- Hierarchy: Organize dashboards by service/team
- Variables: Use template variables for flexibility
- Annotations: Mark deployments and incidents
- Performance: Limit number of panels per dashboard
- Documentation: Add panel descriptions
Alerting
- Thresholds: Set realistic alert thresholds
- Priorities: Classify alerts by severity
- Routing: Route alerts to appropriate teams
- Notification Fatigue: Avoid too many alerts
- Runbooks: Link to runbooks in alerts
Integration Examples
Full Stack Monitoring
Cloud-Native Observability
Minimal Monitoring
Metrics to Monitor
System Metrics
- CPU usage
- Memory usage
- Disk I/O
- Network traffic
- Open file descriptors
Application Metrics
- Request rate
- Response time
- Error rate
- Active connections
- Queue depth
Database Metrics
- Query latency
- Connection pool
- Cache hit rate
- Replication lag
- Lock contention
Container Metrics
- Container count
- Resource limits
- Restart count
- Image pull time
- Volume usage