VM Alert Policies - Multi-Cloud Manager

Set up threshold-based alerts to monitor VM health and performance across Azure and GCP. Get notified when metrics exceed defined thresholds.

GCP Alert Policies

Google Cloud Monitoring provides comprehensive alert policy management for VM instances.

List VM Alerts

Retrieve all alert policies configured for a specific VM.

Endpoint

GET /api/gcp/vms/<project_id>/<instance_id>/alerts

Response

{
  "value": [
    {
      "name": "1234567890123456789",
      "displayName": "High CPU Usage Alert",
      "enabled": true,
      "description": "Alert when CPU utilization exceeds 80%"
    },
    {
      "name": "9876543210987654321",
      "displayName": "Memory Threshold Alert",
      "enabled": true,
      "description": "Brak opisu."
    }
  ]
}

Response Fields

name

string

Alert policy identifier (last segment of the full policy name)

displayName

string

Human-readable name for the alert policy

enabled

boolean

Whether the alert policy is currently active

description

string

Alert policy documentation/description

Filtering Logic

The API filters alert policies by checking if conditions contain:

f'resource.labels.instance_id = "{instance_id}"'

Supported condition types:

condition_threshold - Metric exceeds threshold
condition_absent - Metric data is missing

Create Alert Policy

Configure a new threshold-based alert for a VM.

Endpoint

POST /api/gcp/vms/<project_id>/<instance_id>/alerts/create

Request Parameters

alertName

string

required

Display name for the alert policy

metricType

string

required

Metric to monitor (e.g., compute.googleapis.com/instance/cpu/utilization)

threshold

number

required

Threshold value that triggers the alert

Request Example

{
  "alertName": "High CPU Usage Alert",
  "metricType": "compute.googleapis.com/instance/cpu/utilization",
  "threshold": 0.8
}

Response

{
  "message": "Utworzono alert 'High CPU Usage Alert'. (Uwaga: nie skonfigurowano kanałów notyfikacji).",
  "name": "1234567890123456789",
  "displayName": "High CPU Usage Alert"
}

Alert Configuration

The created alert policy includes:

Metric Filter

filter = (
    f'metric.type = "{metric_type}" AND '
    f'resource.type = "gce_instance" AND '
    f'resource.labels.instance_id = "{instance_id}"'
)

Ensures the alert only monitors the specified VM instance.

Aggregation

Alignment Period: 60 seconds
Aligner: ALIGN_MEAN (average value)
Per-series aggregation: Mean value over the alignment period

Threshold Condition

Comparison: COMPARISON_GT (greater than)
Threshold Value: User-specified (e.g., 0.8 for 80%)
Duration: 300 seconds (5 minutes)
Trigger: 1 violation required

Alert fires when metric exceeds threshold for 5 consecutive minutes.

Combiner

Type: AND
All conditions must be met to trigger the alert

Notification channels are not automatically configured. Add channels through the GCP Console or use the Notification Channels API.

Delete Alert Policy

Remove an existing alert policy.

Endpoint

DELETE /api/gcp/vms/<project_id>/<alert_name>/alert

Request

No request body required. The alert name is specified in the URL.

Response

{
  "message": "Alert '1234567890123456789' został pomyślnie usunięty."
}

Full Policy Name

The API constructs the full resource name:

f"projects/{project_id}/alertPolicies/{alert_name}"

Common Alert Examples

High CPU Utilization

GCP
Metric Details

{
  "alertName": "High CPU Alert",
  "metricType": "compute.googleapis.com/instance/cpu/utilization",
  "threshold": 0.8
}

Triggers when CPU usage exceeds 80% for 5 minutes.

Type: compute.googleapis.com/instance/cpu/utilization
Unit: Percentage (0.0 to 1.0)
Agent Required: No (platform metric)
Typical Threshold: 0.7 - 0.9

Memory Usage

GCP
Metric Details

{
  "alertName": "High Memory Alert",
  "metricType": "agent.googleapis.com/memory/percent_used",
  "threshold": 85
}

Triggers when memory usage exceeds 85% for 5 minutes.

Type: agent.googleapis.com/memory/percent_used
Unit: Percentage (0 to 100)
Agent Required: Yes (Ops Agent)
Typical Threshold: 80 - 90

Disk Usage

GCP
Metric Details

{
  "alertName": "Disk Space Alert",
  "metricType": "agent.googleapis.com/disk/percent_used",
  "threshold": 90
}

Triggers when disk usage exceeds 90% for 5 minutes.

Type: agent.googleapis.com/disk/percent_used
Unit: Percentage (0 to 100)
Agent Required: Yes (Ops Agent)
Typical Threshold: 85 - 95

Network Traffic

GCP - Received Bytes
GCP - Sent Bytes

{
  "alertName": "High Network Input",
  "metricType": "compute.googleapis.com/instance/network/received_bytes_count",
  "threshold": 1000000000
}

Triggers when received bytes exceed 1 GB over 5 minutes.

{
  "alertName": "High Network Output",
  "metricType": "compute.googleapis.com/instance/network/sent_bytes_count",
  "threshold": 1000000000
}

Triggers when sent bytes exceed 1 GB over 5 minutes.

Azure Alert Policies

Azure alert policy management is not yet implemented in the current API. Azure Monitor alerts can be configured through:

Azure Portal
Azure CLI
Azure Monitor REST API
ARM templates

Future Implementation

Planned endpoints for Azure alert management:

# List alerts
GET /api/azure/vms/<vm_name>/alerts

# Create alert
POST /api/azure/vms/<vm_name>/alerts/create

# Delete alert
DELETE /api/azure/vms/<vm_name>/alerts/<alert_id>

Azure Monitor Alert Rules

When implementing Azure alerts, use these metric types:

Metric	Namespace	Aggregation
Percentage CPU	`Microsoft.Compute/virtualMachines`	Average
Available Memory Bytes	`Microsoft.Compute/virtualMachines`	Average
Network In Total	`Microsoft.Compute/virtualMachines`	Total
Network Out Total	`Microsoft.Compute/virtualMachines`	Total
Disk Read Bytes	`Microsoft.Compute/virtualMachines`	Total
Disk Write Bytes	`Microsoft.Compute/virtualMachines`	Total

Error Handling

401 Unauthorized
400 Bad Request
500 Server Error

{
  "error": "Nie znaleziono aktywnego konta GCP w sesji"
}

Solution: Reauthenticate through the OAuth flow.

{
  "error": "Wymagane pola: alertName, metricType, threshold"
}

Solution: Include all required parameters in the request.

{
  "error": "Błąd podczas tworzenia alertu: ..."
}

Solution: Check error details and verify IAM permissions.

Required IAM Permissions (GCP)

To manage alert policies, the service account needs:

# Read permissions
- monitoring.alertPolicies.get
- monitoring.alertPolicies.list

# Write permissions
- monitoring.alertPolicies.create
- monitoring.alertPolicies.delete
- monitoring.alertPolicies.update

# Predefined role
roles/monitoring.alertPolicyEditor

Notification Channels

Alert policies can send notifications through various channels:

Email

Send alerts to email addresses

SMS

Text message notifications

Slack

Post alerts to Slack channels

PagerDuty

Integrate with PagerDuty incidents

Webhooks

Custom HTTP endpoints

Pub/Sub

GCP Pub/Sub topics

Configure Notification Channels

Notification channels must be created separately:

# GCP CLI example
gcloud alpha monitoring channels create \
  --display-name="Email Notifications" \
  --type=email \
  [email protected]

Then link channels to alert policies through the GCP Console or API.

Code Reference

GCP Alert Implementation

List Alerts: backend/gcp/vmmonitor.py:379-428
Create Alert: backend/gcp/vmmonitor.py:430-502
Delete Alert: backend/gcp/vmmonitor.py:504-528

Dependencies

google-cloud-monitoring>=2.0.0
google-cloud-logging>=3.0.0

Best Practices

Threshold Selection

CPU: 70-90% for sustained load alerts
Memory: 80-90% to prevent OOM errors
Disk: 85-95% to allow cleanup time
Network: Based on bandwidth capacity

Alert Duration

Use 5-minute durations to avoid false positives from temporary spikes:

Short spikes: Normal behavior, don’t alert
Sustained issues: Real problems, alert immediately

Alert Naming

Use descriptive names that include:

Metric being monitored
Threshold value
Severity level

Example: CRITICAL: CPU > 90% for 5min

Documentation

Include runbook steps in alert descriptions:

Diagnostic commands
Common causes
Remediation steps

Next Steps

VM Monitoring

Configure metrics and agents

GCP VMs

Manage GCP VM instances

Get Started

Authentication

Core Features

Virtual Machines

Containers

Storage

Networking

Monitoring & Logs

Deployment

​GCP Alert Policies

​List VM Alerts

​Endpoint

​Response

​Response Fields

​Filtering Logic

​Create Alert Policy

​Endpoint

​Request Parameters

​Request Example

​Response

​Alert Configuration

​Delete Alert Policy

​Endpoint

​Request

​Response

​Full Policy Name

​Common Alert Examples

​High CPU Utilization

​Memory Usage

​Disk Usage

​Network Traffic

​Azure Alert Policies

​Future Implementation

​Azure Monitor Alert Rules

​Error Handling

​Required IAM Permissions (GCP)

​Notification Channels

Email

SMS

Slack

PagerDuty

Webhooks

Pub/Sub

​Configure Notification Channels

​Code Reference

​GCP Alert Implementation

​Dependencies

​Best Practices

​Next Steps

VM Monitoring

GCP VMs

Build docs developers (and LLMs) love

GCP Alert Policies

List VM Alerts

Endpoint

Response

Response Fields

Filtering Logic

Create Alert Policy

Endpoint

Request Parameters

Request Example

Response

Alert Configuration

Delete Alert Policy

Endpoint

Request

Response

Full Policy Name

Common Alert Examples

High CPU Utilization

Memory Usage

Disk Usage

Network Traffic

Azure Alert Policies

Future Implementation

Azure Monitor Alert Rules

Error Handling

Required IAM Permissions (GCP)

Notification Channels

Configure Notification Channels

Code Reference

GCP Alert Implementation

Dependencies

Best Practices

Next Steps