Installation
Running the Server
STDIO Transport Mode
HTTP Streaming Transport Mode
Available Tools
The server provides the following tools for monitoring:| Tool Name | Description |
|---|---|
list_alarms | List alarms in the tenancy |
get_metrics_data | Get aggregated metric data |
get_available_metrics | List the available metrics a user can query on in their tenancy |
Usage Examples
List Alarms
Query Metrics Data
Discover Available Metrics
Understanding OCI Monitoring
What is OCI Monitoring?
OCI Monitoring collects, aggregates, and analyzes metrics from your cloud resources:- Metrics - Time-series data about resource performance
- Alarms - Notifications when metrics breach thresholds
- Dashboards - Visualize metrics and trends
- Integration - Works with Notifications and Events services
Key Concepts
Metrics- Emitted by OCI resources automatically
- Organized by namespace (e.g., oci_computeagent, oci_lbaas)
- Include dimensions for filtering (instance OCID, region, etc.)
- Support various aggregation methods (mean, sum, min, max, count)
oci_computeagent- Compute instance metrics (CPU, memory, disk)oci_lbaas- Load balancer metricsoci_blockstore- Block volume metricsoci_database- Database metricsoci_autonomous_database- Autonomous Database metrics- Custom namespaces for application metrics
- resourceId - OCID of the resource
- availabilityDomain - AD where resource exists
- region - OCI region
- Custom dimensions for filtering
- Time window for data points
- Statistic (mean, max, min, sum, count, p50, p90, p95, p99)
- Resolution (1m, 5m, 1h)
Alarms
Alarms monitor metrics and trigger notifications: Alarm States:- OK - Metric is within threshold
- Firing - Metric has breached threshold
- Suppressed - Alarm is temporarily disabled
- Critical - Immediate attention required
- Error - Significant issue
- Warning - Potential issue
- Info - Informational only
- Threshold - Value comparison (greater than, less than, greater than or equal, less than or equal, equal)
- Absence - Missing metric data
- Aggregation window and evaluation period
- Suppression rules
Authentication
The server uses OCI CLI configuration from~/.oci/config:
Required Permissions
Your OCI user or instance principal needs these IAM permissions: Read Metrics:Common Use Cases
Performance Monitoring
- Track CPU, memory, and disk utilization
- Monitor application response times
- Identify performance bottlenecks
- Analyze resource consumption trends
Capacity Planning
- Analyze historical usage patterns
- Forecast resource needs
- Identify over-provisioned resources
- Optimize costs based on utilization
Incident Response
- Monitor alarm states
- Investigate metric anomalies
- Correlate metrics across resources
- Track system health during incidents
Cost Optimization
- Identify idle or underutilized resources
- Track resource usage for chargeback
- Monitor cost-related metrics
- Right-size based on actual usage
SLA Monitoring
- Track uptime and availability
- Monitor service level indicators
- Measure response times
- Verify compliance with SLAs
Metric Examples
Compute Instance Metrics
CPU UtilizationLoad Balancer Metrics
Request RateDatabase Metrics
Database ConnectionsQuery Patterns
Time Range Queries
Aggregation Queries
Multi-Dimensional Queries
Best Practices
Alarm Configuration
- Set meaningful alarm names and descriptions
- Choose appropriate severity levels
- Configure reasonable thresholds
- Avoid alarm fatigue with proper tuning
- Test alarms before production use
Metric Queries
- Use appropriate time ranges to avoid data overload
- Select correct aggregation intervals
- Choose relevant statistics for your use case
- Filter by dimensions to focus on specific resources
Monitoring Strategy
- Monitor key performance indicators (KPIs)
- Set up alarms for critical thresholds
- Create dashboards for at-a-glance health
- Review and tune alarms regularly
- Document expected metric baselines
Performance
- Query smaller time ranges for faster results
- Use coarser resolution for long time periods
- Limit number of metrics in single query
- Cache frequently accessed metric data
Troubleshooting
No Metric Data Returned
Possible causes:- Metric namespace or name is incorrect
- Resource doesn’t emit that metric
- Time range has no data points
- Insufficient permissions
- Use
get_available_metricsto verify metric exists - Check resource type supports the metric
- Adjust time range
- Verify IAM permissions
Alarm Not Firing
Check:- Alarm is enabled (not suppressed)
- Threshold is correctly configured
- Evaluation period is appropriate
- Metric data is being emitted
- Query syntax is correct
Permission Errors
Error: NotAuthorizedOrNotFound- Verify IAM policy grants
read metrics - Check compartment in policy matches resource compartment
- Ensure using correct tenancy
Integration with Other Services
Notifications- Send alarm notifications via email, SMS, PagerDuty, Slack
- Configure notification topics
- Route different alarm severities to different channels
- Trigger actions based on metric events
- Automate responses to threshold breaches
- Integrate with Functions for custom logic
- Correlate metrics with logs
- Cross-reference metric anomalies with log events
- Unified observability across metrics and logs
Related Services
- Logging - Collect and analyze logs
- Compute - Monitor instance metrics
- Database - Monitor database performance
- Network Load Balancer - Monitor load balancer metrics
