OCI Monitoring MCP Server - Oracle MCP Servers

The OCI Monitoring MCP server provides tools for interacting with Oracle Cloud Infrastructure Monitoring service, enabling metric queries, alarm management, and resource monitoring.

Installation

uvx oracle.oci-monitoring-mcp-server

Running the Server

STDIO Transport Mode

uvx oracle.oci-monitoring-mcp-server

HTTP Streaming Transport Mode

ORACLE_MCP_HOST=<hostname/IP address> ORACLE_MCP_PORT=<port number> uvx oracle.oci-monitoring-mcp-server

Available Tools

The server provides the following tools for monitoring:

Tool Name	Description
`list_alarms`	List alarms in the tenancy
`get_metrics_data`	Get aggregated metric data
`get_available_metrics`	List the available metrics a user can query on in their tenancy

Usage Examples

List Alarms

List all alarms in my compartment

Retrieves all configured alarms including their names, states, severity, and trigger conditions.

Show me all critical alarms that are firing

Filters alarms by severity and state to identify active issues.

Query Metrics Data

Get CPU utilization metrics for instance ocid1.instance.oc1.phx.example over the last hour

Retrieves aggregated metric data with timestamps and values.

Show disk I/O metrics for my compute instances in the last 24 hours

Queries storage performance metrics for analysis.

Get memory usage statistics for the past week

Retrieves historical metric data for trend analysis.

Discover Available Metrics

What metrics are available for compute instances?

Lists all queryable metrics for specific resource types.

Show me all available metrics in my compartment

Retrieves complete catalog of metrics you can monitor.

Understanding OCI Monitoring

What is OCI Monitoring?

OCI Monitoring collects, aggregates, and analyzes metrics from your cloud resources:

Metrics - Time-series data about resource performance
Alarms - Notifications when metrics breach thresholds
Dashboards - Visualize metrics and trends
Integration - Works with Notifications and Events services

Key Concepts

Metrics

Emitted by OCI resources automatically
Organized by namespace (e.g., oci_computeagent, oci_lbaas)
Include dimensions for filtering (instance OCID, region, etc.)
Support various aggregation methods (mean, sum, min, max, count)

Metric Namespaces

oci_computeagent - Compute instance metrics (CPU, memory, disk)
oci_lbaas - Load balancer metrics
oci_blockstore - Block volume metrics
oci_database - Database metrics
oci_autonomous_database - Autonomous Database metrics
Custom namespaces for application metrics

Dimensions

resourceId - OCID of the resource
availabilityDomain - AD where resource exists
region - OCI region
Custom dimensions for filtering

Aggregation

Time window for data points
Statistic (mean, max, min, sum, count, p50, p90, p95, p99)
Resolution (1m, 5m, 1h)

Alarms

Alarms monitor metrics and trigger notifications: Alarm States:

OK - Metric is within threshold
Firing - Metric has breached threshold
Suppressed - Alarm is temporarily disabled

Alarm Severity:

Critical - Immediate attention required
Error - Significant issue
Warning - Potential issue
Info - Informational only

Trigger Rules:

Threshold - Value comparison (greater than, less than, greater than or equal, less than or equal, equal)
Absence - Missing metric data
Aggregation window and evaluation period
Suppression rules

Authentication

The server uses OCI CLI configuration from ~/.oci/config:

oci setup config

Required Permissions

Your OCI user or instance principal needs these IAM permissions: Read Metrics:

Allow group MonitoringUsers to read metrics in compartment MyCompartment

Manage Alarms:

Allow group MonitoringAdmins to manage alarms in compartment MyCompartment
Allow group MonitoringAdmins to read metrics in compartment MyCompartment

Full Monitoring Access:

Allow group MonitoringAdmins to manage metrics in compartment MyCompartment
Allow group MonitoringAdmins to manage alarms in compartment MyCompartment

Security NoticeAll actions are performed with the permissions of the configured OCI CLI profile. We advise:

Least-privilege IAM setup
Secure credential management
Safe network practices
Secure logging
Never expose secrets in logs or responses

Common Use Cases

Performance Monitoring

Track CPU, memory, and disk utilization
Monitor application response times
Identify performance bottlenecks
Analyze resource consumption trends

Capacity Planning

Analyze historical usage patterns
Forecast resource needs
Identify over-provisioned resources
Optimize costs based on utilization

Incident Response

Monitor alarm states
Investigate metric anomalies
Correlate metrics across resources
Track system health during incidents

Cost Optimization

Identify idle or underutilized resources
Track resource usage for chargeback
Monitor cost-related metrics
Right-size based on actual usage

SLA Monitoring

Track uptime and availability
Monitor service level indicators
Measure response times
Verify compliance with SLAs

Metric Examples

Compute Instance Metrics

CPU Utilization

Namespace: oci_computeagent
Metric: CpuUtilization
Dimensions: resourceId=<instance-ocid>
Unit: Percent

Memory Utilization

Namespace: oci_computeagent
Metric: MemoryUtilization
Dimensions: resourceId=<instance-ocid>
Unit: Percent

Disk I/O

Namespace: oci_computeagent
Metric: DiskBytesRead, DiskBytesWritten
Dimensions: resourceId=<instance-ocid>
Unit: Bytes

Load Balancer Metrics

Request Rate

Namespace: oci_lbaas
Metric: HttpRequests
Dimensions: resourceId=<lb-ocid>
Unit: Count

Active Connections

Namespace: oci_lbaas
Metric: ActiveConnections
Dimensions: resourceId=<lb-ocid>
Unit: Count

Database Metrics

Database Connections

Namespace: oci_database
Metric: SessionCount
Dimensions: resourceId=<db-ocid>
Unit: Count

Storage Utilization

Namespace: oci_autonomous_database
Metric: StorageUtilization
Dimensions: resourceId=<adb-ocid>
Unit: Percent

Query Patterns

Time Range Queries

Get CPU metrics for the last hour

Use relative time ranges for recent data.

Show metrics from 2024-01-01 to 2024-01-07

Use absolute timestamps for historical analysis.

Aggregation Queries

Get average CPU utilization with 5-minute intervals

Aggregate data at appropriate resolution for analysis.

Show maximum memory usage per hour

Use statistical functions to identify peaks.

Multi-Dimensional Queries

Get metrics for all instances in availability domain AD-1

Filter by dimensions to analyze subsets of resources.

Best Practices

Alarm Configuration

Set meaningful alarm names and descriptions
Choose appropriate severity levels
Configure reasonable thresholds
Avoid alarm fatigue with proper tuning
Test alarms before production use

Metric Queries

Use appropriate time ranges to avoid data overload
Select correct aggregation intervals
Choose relevant statistics for your use case
Filter by dimensions to focus on specific resources

Monitoring Strategy

Monitor key performance indicators (KPIs)
Set up alarms for critical thresholds
Create dashboards for at-a-glance health
Review and tune alarms regularly
Document expected metric baselines

Performance

Query smaller time ranges for faster results
Use coarser resolution for long time periods
Limit number of metrics in single query
Cache frequently accessed metric data

Troubleshooting

No Metric Data Returned

Possible causes:

Metric namespace or name is incorrect
Resource doesn’t emit that metric
Time range has no data points
Insufficient permissions

Solutions:

Use get_available_metrics to verify metric exists
Check resource type supports the metric
Adjust time range
Verify IAM permissions

Alarm Not Firing

Check:

Alarm is enabled (not suppressed)
Threshold is correctly configured
Evaluation period is appropriate
Metric data is being emitted
Query syntax is correct

Permission Errors

Error: NotAuthorizedOrNotFound

Verify IAM policy grants read metrics
Check compartment in policy matches resource compartment
Ensure using correct tenancy

Integration with Other Services

Notifications

Send alarm notifications via email, SMS, PagerDuty, Slack
Configure notification topics
Route different alarm severities to different channels

Events

Trigger actions based on metric events
Automate responses to threshold breaches
Integrate with Functions for custom logic

Logging

Correlate metrics with logs
Cross-reference metric anomalies with log events
Unified observability across metrics and logs

Logging - Collect and analyze logs
Compute - Monitor instance metrics
Database - Monitor database performance
Network Load Balancer - Monitor load balancer metrics

Overview

Compute & Networking

Storage & Data

Management & Operations

Application Services

Generic API

​Installation

​Running the Server

​STDIO Transport Mode

​HTTP Streaming Transport Mode

​Available Tools

​Usage Examples

​List Alarms

​Query Metrics Data

​Discover Available Metrics

​Understanding OCI Monitoring

​What is OCI Monitoring?

​Key Concepts

​Alarms

​Authentication

​Required Permissions

​Common Use Cases

​Performance Monitoring

​Capacity Planning

​Incident Response

​Cost Optimization

​SLA Monitoring

​Metric Examples

​Compute Instance Metrics

​Load Balancer Metrics

​Database Metrics

​Query Patterns

​Time Range Queries

​Aggregation Queries

​Multi-Dimensional Queries

​Best Practices

​Alarm Configuration

​Metric Queries

​Monitoring Strategy

​Performance

​Troubleshooting

​No Metric Data Returned

​Alarm Not Firing

​Permission Errors

​Integration with Other Services

​Related Services

​Additional Resources

Build docs developers (and LLMs) love