GCP Alert Policies
Google Cloud Monitoring provides comprehensive alert policy management for VM instances.List VM Alerts
Retrieve all alert policies configured for a specific VM.Endpoint
Response
Response Fields
Alert policy identifier (last segment of the full policy name)
Human-readable name for the alert policy
Whether the alert policy is currently active
Alert policy documentation/description
Filtering Logic
The API filters alert policies by checking if conditions contain:condition_threshold- Metric exceeds thresholdcondition_absent- Metric data is missing
Create Alert Policy
Configure a new threshold-based alert for a VM.Endpoint
Request Parameters
Display name for the alert policy
Metric to monitor (e.g.,
compute.googleapis.com/instance/cpu/utilization)Threshold value that triggers the alert
Request Example
Response
Alert Configuration
The created alert policy includes:Metric Filter
Metric Filter
Aggregation
Aggregation
- Alignment Period: 60 seconds
- Aligner:
ALIGN_MEAN(average value) - Per-series aggregation: Mean value over the alignment period
Threshold Condition
Threshold Condition
- Comparison:
COMPARISON_GT(greater than) - Threshold Value: User-specified (e.g., 0.8 for 80%)
- Duration: 300 seconds (5 minutes)
- Trigger: 1 violation required
Combiner
Combiner
- Type:
AND - All conditions must be met to trigger the alert
Delete Alert Policy
Remove an existing alert policy.Endpoint
Request
No request body required. The alert name is specified in the URL.Response
Full Policy Name
The API constructs the full resource name:Common Alert Examples
High CPU Utilization
- GCP
- Metric Details
Memory Usage
- GCP
- Metric Details
Disk Usage
- GCP
- Metric Details
Network Traffic
- GCP - Received Bytes
- GCP - Sent Bytes
Azure Alert Policies
Azure alert policy management is not yet implemented in the current API. Azure Monitor alerts can be configured through:
- Azure Portal
- Azure CLI
- Azure Monitor REST API
- ARM templates
Future Implementation
Planned endpoints for Azure alert management:Azure Monitor Alert Rules
When implementing Azure alerts, use these metric types:| Metric | Namespace | Aggregation |
|---|---|---|
| Percentage CPU | Microsoft.Compute/virtualMachines | Average |
| Available Memory Bytes | Microsoft.Compute/virtualMachines | Average |
| Network In Total | Microsoft.Compute/virtualMachines | Total |
| Network Out Total | Microsoft.Compute/virtualMachines | Total |
| Disk Read Bytes | Microsoft.Compute/virtualMachines | Total |
| Disk Write Bytes | Microsoft.Compute/virtualMachines | Total |
Error Handling
- 400 Bad Request
- 500 Server Error
Required IAM Permissions (GCP)
To manage alert policies, the service account needs:Notification Channels
Alert policies can send notifications through various channels:Send alerts to email addresses
SMS
Text message notifications
Slack
Post alerts to Slack channels
PagerDuty
Integrate with PagerDuty incidents
Webhooks
Custom HTTP endpoints
Pub/Sub
GCP Pub/Sub topics
Configure Notification Channels
Notification channels must be created separately:Code Reference
GCP Alert Implementation
- List Alerts:
backend/gcp/vmmonitor.py:379-428 - Create Alert:
backend/gcp/vmmonitor.py:430-502 - Delete Alert:
backend/gcp/vmmonitor.py:504-528
Dependencies
google-cloud-monitoring>=2.0.0google-cloud-logging>=3.0.0
Best Practices
Threshold Selection
Threshold Selection
- CPU: 70-90% for sustained load alerts
- Memory: 80-90% to prevent OOM errors
- Disk: 85-95% to allow cleanup time
- Network: Based on bandwidth capacity
Alert Duration
Alert Duration
Use 5-minute durations to avoid false positives from temporary spikes:
- Short spikes: Normal behavior, don’t alert
- Sustained issues: Real problems, alert immediately
Alert Naming
Alert Naming
Use descriptive names that include:
- Metric being monitored
- Threshold value
- Severity level
CRITICAL: CPU > 90% for 5minDocumentation
Documentation
Include runbook steps in alert descriptions:
- Diagnostic commands
- Common causes
- Remediation steps
Next Steps
VM Monitoring
Configure metrics and agents
GCP VMs
Manage GCP VM instances