Skip to main content
The OCI Monitoring MCP server provides tools for interacting with Oracle Cloud Infrastructure Monitoring service, enabling metric queries, alarm management, and resource monitoring.

Installation

uvx oracle.oci-monitoring-mcp-server

Running the Server

STDIO Transport Mode

uvx oracle.oci-monitoring-mcp-server

HTTP Streaming Transport Mode

ORACLE_MCP_HOST=<hostname/IP address> ORACLE_MCP_PORT=<port number> uvx oracle.oci-monitoring-mcp-server

Available Tools

The server provides the following tools for monitoring:
Tool NameDescription
list_alarmsList alarms in the tenancy
get_metrics_dataGet aggregated metric data
get_available_metricsList the available metrics a user can query on in their tenancy

Usage Examples

List Alarms

List all alarms in my compartment
Retrieves all configured alarms including their names, states, severity, and trigger conditions.
Show me all critical alarms that are firing
Filters alarms by severity and state to identify active issues.

Query Metrics Data

Get CPU utilization metrics for instance ocid1.instance.oc1.phx.example over the last hour
Retrieves aggregated metric data with timestamps and values.
Show disk I/O metrics for my compute instances in the last 24 hours
Queries storage performance metrics for analysis.
Get memory usage statistics for the past week
Retrieves historical metric data for trend analysis.

Discover Available Metrics

What metrics are available for compute instances?
Lists all queryable metrics for specific resource types.
Show me all available metrics in my compartment
Retrieves complete catalog of metrics you can monitor.

Understanding OCI Monitoring

What is OCI Monitoring?

OCI Monitoring collects, aggregates, and analyzes metrics from your cloud resources:
  • Metrics - Time-series data about resource performance
  • Alarms - Notifications when metrics breach thresholds
  • Dashboards - Visualize metrics and trends
  • Integration - Works with Notifications and Events services

Key Concepts

Metrics
  • Emitted by OCI resources automatically
  • Organized by namespace (e.g., oci_computeagent, oci_lbaas)
  • Include dimensions for filtering (instance OCID, region, etc.)
  • Support various aggregation methods (mean, sum, min, max, count)
Metric Namespaces
  • oci_computeagent - Compute instance metrics (CPU, memory, disk)
  • oci_lbaas - Load balancer metrics
  • oci_blockstore - Block volume metrics
  • oci_database - Database metrics
  • oci_autonomous_database - Autonomous Database metrics
  • Custom namespaces for application metrics
Dimensions
  • resourceId - OCID of the resource
  • availabilityDomain - AD where resource exists
  • region - OCI region
  • Custom dimensions for filtering
Aggregation
  • Time window for data points
  • Statistic (mean, max, min, sum, count, p50, p90, p95, p99)
  • Resolution (1m, 5m, 1h)

Alarms

Alarms monitor metrics and trigger notifications: Alarm States:
  • OK - Metric is within threshold
  • Firing - Metric has breached threshold
  • Suppressed - Alarm is temporarily disabled
Alarm Severity:
  • Critical - Immediate attention required
  • Error - Significant issue
  • Warning - Potential issue
  • Info - Informational only
Trigger Rules:
  • Threshold - Value comparison (greater than, less than, greater than or equal, less than or equal, equal)
  • Absence - Missing metric data
  • Aggregation window and evaluation period
  • Suppression rules

Authentication

The server uses OCI CLI configuration from ~/.oci/config:
oci setup config

Required Permissions

Your OCI user or instance principal needs these IAM permissions: Read Metrics:
Allow group MonitoringUsers to read metrics in compartment MyCompartment
Manage Alarms:
Allow group MonitoringAdmins to manage alarms in compartment MyCompartment
Allow group MonitoringAdmins to read metrics in compartment MyCompartment
Full Monitoring Access:
Allow group MonitoringAdmins to manage metrics in compartment MyCompartment
Allow group MonitoringAdmins to manage alarms in compartment MyCompartment
Security NoticeAll actions are performed with the permissions of the configured OCI CLI profile. We advise:
  • Least-privilege IAM setup
  • Secure credential management
  • Safe network practices
  • Secure logging
  • Never expose secrets in logs or responses

Common Use Cases

Performance Monitoring

  • Track CPU, memory, and disk utilization
  • Monitor application response times
  • Identify performance bottlenecks
  • Analyze resource consumption trends

Capacity Planning

  • Analyze historical usage patterns
  • Forecast resource needs
  • Identify over-provisioned resources
  • Optimize costs based on utilization

Incident Response

  • Monitor alarm states
  • Investigate metric anomalies
  • Correlate metrics across resources
  • Track system health during incidents

Cost Optimization

  • Identify idle or underutilized resources
  • Track resource usage for chargeback
  • Monitor cost-related metrics
  • Right-size based on actual usage

SLA Monitoring

  • Track uptime and availability
  • Monitor service level indicators
  • Measure response times
  • Verify compliance with SLAs

Metric Examples

Compute Instance Metrics

CPU Utilization
Namespace: oci_computeagent
Metric: CpuUtilization
Dimensions: resourceId=<instance-ocid>
Unit: Percent
Memory Utilization
Namespace: oci_computeagent
Metric: MemoryUtilization
Dimensions: resourceId=<instance-ocid>
Unit: Percent
Disk I/O
Namespace: oci_computeagent
Metric: DiskBytesRead, DiskBytesWritten
Dimensions: resourceId=<instance-ocid>
Unit: Bytes

Load Balancer Metrics

Request Rate
Namespace: oci_lbaas
Metric: HttpRequests
Dimensions: resourceId=<lb-ocid>
Unit: Count
Active Connections
Namespace: oci_lbaas
Metric: ActiveConnections
Dimensions: resourceId=<lb-ocid>
Unit: Count

Database Metrics

Database Connections
Namespace: oci_database
Metric: SessionCount
Dimensions: resourceId=<db-ocid>
Unit: Count
Storage Utilization
Namespace: oci_autonomous_database
Metric: StorageUtilization
Dimensions: resourceId=<adb-ocid>
Unit: Percent

Query Patterns

Time Range Queries

Get CPU metrics for the last hour
Use relative time ranges for recent data.
Show metrics from 2024-01-01 to 2024-01-07
Use absolute timestamps for historical analysis.

Aggregation Queries

Get average CPU utilization with 5-minute intervals
Aggregate data at appropriate resolution for analysis.
Show maximum memory usage per hour
Use statistical functions to identify peaks.

Multi-Dimensional Queries

Get metrics for all instances in availability domain AD-1
Filter by dimensions to analyze subsets of resources.

Best Practices

Alarm Configuration

  • Set meaningful alarm names and descriptions
  • Choose appropriate severity levels
  • Configure reasonable thresholds
  • Avoid alarm fatigue with proper tuning
  • Test alarms before production use

Metric Queries

  • Use appropriate time ranges to avoid data overload
  • Select correct aggregation intervals
  • Choose relevant statistics for your use case
  • Filter by dimensions to focus on specific resources

Monitoring Strategy

  • Monitor key performance indicators (KPIs)
  • Set up alarms for critical thresholds
  • Create dashboards for at-a-glance health
  • Review and tune alarms regularly
  • Document expected metric baselines

Performance

  • Query smaller time ranges for faster results
  • Use coarser resolution for long time periods
  • Limit number of metrics in single query
  • Cache frequently accessed metric data

Troubleshooting

No Metric Data Returned

Possible causes:
  • Metric namespace or name is incorrect
  • Resource doesn’t emit that metric
  • Time range has no data points
  • Insufficient permissions
Solutions:
  1. Use get_available_metrics to verify metric exists
  2. Check resource type supports the metric
  3. Adjust time range
  4. Verify IAM permissions

Alarm Not Firing

Check:
  • Alarm is enabled (not suppressed)
  • Threshold is correctly configured
  • Evaluation period is appropriate
  • Metric data is being emitted
  • Query syntax is correct

Permission Errors

Error: NotAuthorizedOrNotFound
  • Verify IAM policy grants read metrics
  • Check compartment in policy matches resource compartment
  • Ensure using correct tenancy

Integration with Other Services

Notifications
  • Send alarm notifications via email, SMS, PagerDuty, Slack
  • Configure notification topics
  • Route different alarm severities to different channels
Events
  • Trigger actions based on metric events
  • Automate responses to threshold breaches
  • Integrate with Functions for custom logic
Logging
  • Correlate metrics with logs
  • Cross-reference metric anomalies with log events
  • Unified observability across metrics and logs

Additional Resources

Build docs developers (and LLMs) love