Container Alerts

Set up automated alerts to monitor container health, performance, and resource usage. Create alerts based on metrics thresholds and receive notifications when conditions are met.

GCP Cloud Run Alerts

Cloud Run alerts use Google Cloud Monitoring to trigger notifications based on metric thresholds.

List Alerts

Retrieve all alert policies for a specific Cloud Run service.

Endpoint

GET /api/gcp/containers/{projectId}/{containerName}/alerts

Parameters

projectId

string

required

GCP project ID

containerName

string

required

Cloud Run service name

Response

{
  "value": [
    {
      "name": "1234567890123456789",
      "displayName": "High Request Latency Alert",
      "enabled": true,
      "description": "Alert when P95 latency exceeds 1000ms for 5 minutes"
    },
    {
      "name": "9876543210987654321",
      "displayName": "High Request Count",
      "enabled": true,
      "description": "Brak opisu."
    }
  ]
}

Implementation

from google.cloud import monitoring_v3

def list_gcp_container_alerts(project_id, container_name):
    accounts = session.get("accounts", [])
    gcp_account = next(
        (acc for acc in accounts if acc.get("provider") == "gcp"),
        None
    )
    
    credentials = SessionCredentials(gcp_account)
    client = monitoring_v3.AlertPolicyServiceClient(credentials=credentials)
    project_name = f"projects/{project_id}"
    
    request = monitoring_v3.ListAlertPoliciesRequest(name=project_name)
    policies = client.list_alert_policies(request=request)
    
    container_alerts = []
    filter_str_1 = f'resource.labels.service_name = "{container_name}"'
    filter_str_2 = f'resource.type = "cloud_run_revision"'
    
    # Filter policies related to this container
    for policy in policies:
        found = False
        for condition in policy.conditions:
            filter_text = ""
            if condition.condition_threshold and condition.condition_threshold.filter:
                filter_text = condition.condition_threshold.filter
            elif condition.condition_absent and condition.condition_absent.filter:
                filter_text = condition.condition_absent.filter
            
            if filter_str_1 in filter_text and filter_str_2 in filter_text:
                found = True
                break
        
        if found:
            container_alerts.append({
                "name": policy.name.split('/')[-1],
                "displayName": policy.display_name,
                "enabled": policy.enabled,
                "description": policy.documentation.content if policy.documentation else "Brak opisu."
            })
    
    return jsonify({"value": container_alerts}), 200

Only alert policies that specifically filter for the Cloud Run service and resource type are returned. Project-wide alerts are excluded.

Create Alert

Create a new alert policy for a Cloud Run service.

Endpoint

POST /api/gcp/containers/{projectId}/{region}/{containerName}/alerts

Request Body

{
  "alertName": "High Request Latency Alert",
  "metricType": "run.googleapis.com/request_latencies",
  "threshold": 1000
}

Parameters

projectId

string

required

GCP project ID

region

string

required

Cloud Run service region (e.g., europe-west1)

containerName

string

required

Cloud Run service name

alertName

string

required

Display name for the alert policy

metricType

string

required

Metric type to monitor:

run.googleapis.com/request_count - Request count
run.googleapis.com/request_latencies - Request latency (P95)
run.googleapis.com/container/instance_count - Instance count

threshold

number

required

Threshold value that triggers the alert

Response

{
  "message": "Utworzono alert 'High Request Latency Alert'. (Uwaga: nie skonfigurowano kanałów notyfikacji).",
  "name": "1234567890123456789",
  "displayName": "High Request Latency Alert"
}

Implementation

from google.cloud import monitoring_v3

def create_gcp_container_alert(project_id, region, container_name):
    accounts = session.get("accounts", [])
    gcp_account = next(
        (acc for acc in accounts if acc.get("provider") == "gcp"),
        None
    )
    
    data = request.get_json()
    display_name = data.get("alertName")
    metric_type = data.get("metricType")
    threshold = data.get("threshold")
    
    credentials = SessionCredentials(gcp_account)
    client = monitoring_v3.AlertPolicyServiceClient(credentials=credentials)
    project_name = f"projects/{project_id}"
    
    # Create condition
    condition = monitoring_v3.AlertPolicy.Condition(
        display_name=f"{metric_type} > {threshold} przez 5 minut",
        condition_threshold=monitoring_v3.AlertPolicy.Condition.MetricThreshold(
            filter=(
                f'metric.type = "{metric_type}" AND '
                f'resource.type = "cloud_run_revision" AND '
                f'resource.labels.service_name = "{container_name}" AND '
                f'resource.labels.location = "{region}"'
            ),
            aggregations=[
                monitoring_v3.Aggregation(
                    alignment_period={"seconds": 60},
                    per_series_aligner=monitoring_v3.Aggregation.Aligner.ALIGN_MEAN,
                )
            ],
            comparison=monitoring_v3.ComparisonType.COMPARISON_GT,
            threshold_value=float(threshold),
            duration={"seconds": 300},  # 5 minutes
            trigger=monitoring_v3.AlertPolicy.Condition.Trigger(count=1),
        ),
    )
    
    # Create alert policy
    policy = monitoring_v3.AlertPolicy(
        display_name=display_name,
        combiner=monitoring_v3.AlertPolicy.ConditionCombinerType.AND,
        conditions=[condition],
    )
    
    request_data = monitoring_v3.CreateAlertPolicyRequest(
        name=project_name,
        alert_policy=policy
    )
    created_policy = client.create_alert_policy(request=request_data)
    
    return jsonify({
        "message": f"Utworzono alert '{created_policy.display_name}'. (Uwaga: nie skonfigurowano kanałów notyfikacji).",
        "name": created_policy.name.split('/')[-1],
        "displayName": created_policy.display_name
    }), 201

Alert Configuration Details

Duration:

Alerts trigger after the condition persists for 5 minutes (300 seconds)
This prevents false positives from temporary spikes
Modify duration parameter to change the evaluation window

Aggregation:

Metrics are aggregated over 60-second intervals
ALIGN_MEAN calculates the average value per interval
Use ALIGN_SUM for counters or ALIGN_MAX for peak values

Comparison:

COMPARISON_GT: Greater than threshold
COMPARISON_GE: Greater than or equal
COMPARISON_LT: Less than
COMPARISON_LE: Less than or equal

Trigger:

count=1: Alert fires as soon as condition is met
Increase count to require multiple consecutive violations

This implementation creates alert policies without notification channels. Configure notification channels in the GCP Console or via API to receive alerts via email, SMS, Slack, PagerDuty, etc.

Delete Alert

Delete an existing alert policy.

Endpoint

DELETE /api/gcp/containers/{projectId}/alerts/{alertName}

Parameters

projectId

string

required

GCP project ID

alertName

string

required

Alert policy name (numeric ID)

Response

{
  "message": "Alert '1234567890123456789' został pomyślnie usunięty."
}

Implementation

def delete_gcp_container_alert(project_id, alert_name):
    accounts = session.get("accounts", [])
    gcp_account = next(
        (acc for acc in accounts if acc.get("provider") == "gcp"),
        None
    )
    
    credentials = SessionCredentials(gcp_account)
    client = monitoring_v3.AlertPolicyServiceClient(credentials=credentials)
    policy_full_name = f"projects/{project_id}/alertPolicies/{alert_name}"
    
    request_data = monitoring_v3.DeleteAlertPolicyRequest(name=policy_full_name)
    client.delete_alert_policy(request=request_data)
    
    return jsonify({
        "message": f"Alert '{alert_name}' został pomyślnie usunięty."
    }), 200

Deleting an alert policy is permanent and cannot be undone. The alert will immediately stop monitoring the service.

Common Alert Scenarios

High Request Latency

Alert when P95 request latency exceeds acceptable limits.

{
  "alertName": "High Request Latency",
  "metricType": "run.googleapis.com/request_latencies",
  "threshold": 1000
}

Use case: Detect performance degradation before it impacts users. Threshold guidance:

Web applications: 200-500ms
APIs: 100-200ms
Background services: 1000-5000ms

Request Count Spike

Alert when request count exceeds normal traffic patterns.

{
  "alertName": "Unusual Request Volume",
  "metricType": "run.googleapis.com/request_count",
  "threshold": 10000
}

Use case: Detect traffic spikes from marketing campaigns, DDoS attacks, or viral content. Threshold guidance:

Calculate baseline from historical data
Set threshold at 2-3x normal peak traffic
Adjust based on scaling capacity

Instance Count Alert

Alert when container instance count indicates scaling issues.

{
  "alertName": "High Instance Count",
  "metricType": "run.googleapis.com/container/instance_count",
  "threshold": 50
}

Use case: Detect unexpected scaling events or runaway containers. Threshold guidance:

Set below configured max_instance_count
Consider cost implications of sustained high instance counts
Alert when approaching quota limits

Low Request Count

Alert when request count drops unexpectedly (service health check).

{
  "alertName": "Service Availability Issue",
  "metricType": "run.googleapis.com/request_count",
  "threshold": 10
}

Configuration: Change comparison to COMPARISON_LT (less than). Use case: Detect service outages, DNS issues, or upstream failures.

Alert Best Practices

Threshold Selection

Baseline metrics first: Collect 1-2 weeks of data before setting thresholds
Avoid false positives: Set thresholds with buffer above normal variance
Consider time of day: Use different thresholds for peak vs off-peak hours
Test alerts: Trigger test conditions to validate notification delivery

Alert Fatigue Prevention

Actionable alerts only: Each alert should require human action
Appropriate duration: Use 5+ minute windows to avoid transient noise
Consolidate conditions: Combine related metrics into single alerts
Regular review: Disable or adjust alerts that trigger frequently without issues

Notification Channels

Configure notification channels for different severity levels: Critical alerts (immediate action required):

PagerDuty for on-call rotation
SMS for urgent notifications
Phone calls for P0 incidents

Warning alerts (monitor closely):

Slack channels for team visibility
Email for documentation trail
Webhooks for automated responses

Info alerts (awareness):

Email digests
Dashboard visualization
Log aggregation

Multi-Condition Alerts

Create sophisticated alerts by combining multiple conditions:

# Alert when both latency is high AND error rate increases
policy = monitoring_v3.AlertPolicy(
    display_name="Service Degradation",
    combiner=monitoring_v3.AlertPolicy.ConditionCombinerType.AND,
    conditions=[latency_condition, error_rate_condition],
)

Documentation

Include clear documentation in alert descriptions:

policy = monitoring_v3.AlertPolicy(
    display_name="High Request Latency",
    documentation=monitoring_v3.AlertPolicy.Documentation(
        content="""## Runbook: High Request Latency
        
        **Severity:** P2
        **Impact:** User experience degradation
        
        **Investigation steps:**
        1. Check Cloud Run logs for errors
        2. Review recent deployments
        3. Verify database performance
        4. Check external API response times
        
        **Mitigation:**
        - Increase max_instance_count if at limit
        - Roll back recent deployment if applicable
        - Scale up instance resources
        
        **Escalation:** Contact backend team lead after 15 minutes
        """,
        mime_type="text/markdown"
    ),
    conditions=[condition],
)

Azure Container Instances Alerts

Azure Container Instances alerts are configured through Azure Monitor. While the current implementation focuses on GCP Cloud Run, similar patterns apply:

Metric Alerts

Create alerts based on CPU and memory metrics:

CpuUsage > threshold for X minutes
MemoryUsage > threshold for X minutes
Container state changes (Running → Failed)

Log Query Alerts

Create alerts based on Log Analytics queries:

ContainerInstanceLog_CL
| where ContainerGroup_s == 'my-container'
| where Message contains 'ERROR'
| summarize ErrorCount=count() by bin(TimeGenerated, 5m)
| where ErrorCount > 10

Configuration via Azure Portal

Navigate to Container Instance in Azure Portal
Select “Alerts” from left menu
Click “New alert rule”
Configure signal, condition, and action group
Save alert rule

Azure alerts support action groups for notifications (email, SMS, webhook, Logic Apps, Azure Functions) and automated remediation.

Error Handling

Common Errors

Unauthorized (401):

{
  "error": "Nie znaleziono aktywnego konta GCP w sesji"
}

Solution: Ensure GCP account is authenticated with valid refresh token. Bad Request (400):

{
  "error": "Wymagane pola: alertName, metricType, threshold"
}

Solution: Provide all required parameters in request body. Forbidden (403):

{
  "error": "Permission denied on resource project my-project"
}

Solution: Grant Monitoring Admin or Monitoring Alert Policy Editor role. Server Error (500):

{
  "error": "Błąd podczas tworzenia alertu: Invalid metric type"
}

Solution: Verify metric type is valid for Cloud Run.

Monitoring Alert Health

Alert Testing

Test alerts before relying on them in production:

Trigger threshold artificially: Generate load to exceed threshold
Verify notification delivery: Confirm all channels receive alerts
Test escalation paths: Ensure on-call rotations work correctly
Measure response time: Track time from alert to mitigation

Alert Metrics

Track alert effectiveness:

True positive rate: Alerts that identified real issues
False positive rate: Alerts without actual problems
Mean time to acknowledge (MTTA): How quickly alerts are noticed
Mean time to resolve (MTTR): How quickly issues are fixed

Regular Maintenance

Weekly: Review triggered alerts and response actions
Monthly: Adjust thresholds based on traffic patterns
Quarterly: Audit all alert policies for relevance
After incidents: Update alerts to catch similar issues earlier

Container Monitoring

Monitor container metrics and logs

GCP Containers

Manage GCP Cloud Run services

Azure Containers

Manage Azure Container Instances

Get Started

Authentication

Core Features

Virtual Machines

Containers

Storage

Networking

Monitoring & Logs

Deployment

​Container Alerts

​GCP Cloud Run Alerts

​List Alerts

​Endpoint

​Parameters

​Response

​Implementation

​Create Alert

​Endpoint

​Request Body

​Parameters

​Response

​Implementation

​Alert Configuration Details

​Delete Alert

​Endpoint

​Parameters

​Response

​Implementation

​Common Alert Scenarios

​High Request Latency

​Request Count Spike

​Instance Count Alert

​Low Request Count

​Alert Best Practices

​Threshold Selection

​Alert Fatigue Prevention

​Notification Channels

​Multi-Condition Alerts

​Documentation

​Azure Container Instances Alerts

​Metric Alerts

​Log Query Alerts

​Configuration via Azure Portal

​Error Handling

​Common Errors

​Monitoring Alert Health

​Alert Testing

​Alert Metrics

​Regular Maintenance

​Related Documentation

Container Monitoring

GCP Containers

Azure Containers

Build docs developers (and LLMs) love

Container Alerts

GCP Cloud Run Alerts

List Alerts

Endpoint

Parameters

Response

Implementation

Create Alert

Endpoint

Request Body

Parameters

Response

Implementation

Alert Configuration Details

Delete Alert

Endpoint

Parameters

Response

Implementation

Common Alert Scenarios

High Request Latency

Request Count Spike

Instance Count Alert

Low Request Count

Alert Best Practices

Threshold Selection

Alert Fatigue Prevention

Notification Channels

Multi-Condition Alerts

Documentation

Azure Container Instances Alerts

Metric Alerts

Log Query Alerts

Configuration via Azure Portal

Error Handling

Common Errors

Monitoring Alert Health

Alert Testing

Alert Metrics

Regular Maintenance

Related Documentation