Skip to main content
Aiven provides built-in monitoring and logging for all services, along with integrations to export metrics and logs to external platforms. Monitor service health, performance, and troubleshoot issues with comprehensive observability tools.

Monitoring overview

Aiven offers multiple levels of monitoring:

Built-in Metrics

  • Real-time service metrics
  • CPU, memory, disk, network
  • Service-specific metrics
  • Available in Console

Service Logs

  • Service operation logs
  • Error and debug messages
  • Connection logs
  • 4-day retention

Audit Logs

  • Organization events
  • Project events
  • User actions
  • Configuration changes

Service metrics

View real-time metrics for all your services:

Built-in metrics

Available for every service without additional configuration:
Infrastructure-level metrics
CPU Usage
Percentage
Percentage of CPU resources consumed by the service
Memory Usage
Percentage
Percentage of memory utilized by the service
Disk Space Usage
Percentage
Percentage of disk space used
Load Average
Number
5-minute average CPU load indicating system computational load
Disk IOPS (reads)
Operations/sec
Input/output operations per second for disk reads
Disk IOPS (writes)
Operations/sec
Input/output operations per second for disk writes
Network Received
Bytes/sec
Network traffic received by the service
Network Transmitted
Bytes/sec
Network traffic transmitted by the service

Viewing metrics

1

Open service in Console

Navigate to Projects → Select project → Select service
2

View metrics tab

Click Metrics tab to see real-time charts
3

Adjust time range

Select time range: 1 hour, 6 hours, 1 day, 7 days, 30 days
4

Export via CLI

# Get metrics via CLI
avn service metrics \
  --project my-project \
  --service postgres-1 \
  --period hour

Advanced metrics integration

For detailed service-specific metrics, set up metrics integration:
1

Create PostgreSQL service

Store metrics data in PostgreSQL database
avn service create metrics-db \
  --project my-project \
  --service-type pg \
  --plan business-4 \
  --cloud aws-us-east-1
2

Create Grafana service

Visualize metrics with Grafana dashboards
avn service create metrics-grafana \
  --project my-project \
  --service-type grafana \
  --plan startup-4 \
  --cloud aws-us-east-1
3

Integrate services

# Connect Grafana to PostgreSQL
avn service integration-create \
  --project my-project \
  --source-service metrics-db \
  --dest-service metrics-grafana \
  --integration-type dashboard

# Send service metrics to PostgreSQL  
avn service integration-create \
  --project my-project \
  --source-service postgres-1 \
  --dest-service metrics-db \
  --integration-type metrics
4

View dashboards

Access Grafana using the service URI from Console connection information
Metrics integration requires separate PostgreSQL and Grafana services (additional cost). Predefined dashboards are automatically created and maintained.

Service logs

View logs for troubleshooting and monitoring:

Accessing service logs

1

Navigate to service

Select project → Select service
2

View logs tab

Click Logs tab
3

Filter logs

  • Filter by severity (error, warning, info)
  • Search for specific text
  • Adjust time range

Log retention

Service logs are retained for 4 days by default. For longer retention, set up log integration with OpenSearch.
Default Log Retention:
  Service Logs: 4 days
  Organization Audit Logs: 90 days
  Project Event Logs: 90 days
  
Extended Retention:
  Method: Log integration with OpenSearch
  Duration: Limited by OpenSearch disk space
  Cost: OpenSearch service cost

Log integration

Send logs to OpenSearch for long-term storage and analysis:
1

Create OpenSearch service

avn service create logs-opensearch \
  --project my-project \
  --service-type opensearch \
  --plan business-8 \
  --cloud aws-us-east-1
2

Enable log integration

avn service integration-create \
  --project my-project \
  --source-service postgres-1 \
  --dest-service logs-opensearch \
  --integration-type logs
3

Access OpenSearch Dashboards

Use service URI to open OpenSearch Dashboards and analyze logs
4

Configure retention

Set index lifecycle management in OpenSearch for desired retention period
All services in the project can send logs to the same OpenSearch service. Create one OpenSearch service for centralized logging.

Audit logs

Track administrative actions and changes:

Organization audit logs

View organization-level events:
1

Navigate to organization logs

AdminOrganizationEvents Log
2

Review events

Events include:
  • User invitations and removals
  • Permission changes
  • Domain verification
  • IdP configuration
  • Billing group changes
  • Organizational unit creation/deletion

Project event logs

View project-level events:
1

Navigate to project logs

Select project → Event log
2

Review events

Events include:
  • Service creation/deletion/updates
  • Configuration changes
  • User access changes
  • VPC and peering changes
  • Integration changes
  • Backup operations

API access to audit logs

# Get organization audit logs
curl -H "Authorization: Bearer $TOKEN" \
  https://api.aiven.io/v1/organization/{org_id}/audit

# Get project event logs  
curl -H "Authorization: Bearer $TOKEN" \
  https://api.aiven.io/v1/project/{project}/events

# Filter by date range
curl -H "Authorization: Bearer $TOKEN" \
  "https://api.aiven.io/v1/project/{project}/events?from=2024-01-01&to=2024-01-31"
Organization audit logs require organization:audit_logs:read permission. Project logs require project:audit_logs:read permission.

Prometheus integration

Expose metrics in Prometheus format for scraping:

Enabling Prometheus

1

Create Prometheus endpoint

Navigate to project → Integration endpointsAdd new endpointPrometheus
2

Enable for service

Service → OverviewService integrationsManage integrationsPrometheusEnable
3

Get metrics endpoint

Service → OverviewConnection informationPrometheus tabCopy the Service URI and credentials
4

Configure Prometheus scraper

# prometheus.yml
scrape_configs:
  - job_name: 'aiven-postgres'
    scheme: https
    basic_auth:
      username: <PROMETHEUS_USER>
      password: <PROMETHEUS_PASSWORD>
    tls_config:
      ca_file: ca.pem
    static_configs:
      - targets: ['<SERVICE_HOST>:<PROMETHEUS_PORT>']

Prometheus in VPC

If using VPC peering, enable public Prometheus access:
# Enable public access to Prometheus endpoint
avn service update \
  --project my-project \
  --service postgres-1 \
  -c public_access.prometheus=true

Prometheus metrics

Available metrics include:
  • System metrics: CPU, memory, disk, network
  • Service metrics: Connections, queries, cache hits
  • Custom metrics: Application-specific (service dependent)
# Example metrics query
curl -u <USER>:<PASS> https://<SERVICE_HOST>:<PORT>/metrics

# Output:
# cpu_usage_percent{host="node1"} 45.2
# memory_usage_percent{host="node1"} 67.8
# pg_stat_database_numbackends{datname="defaultdb"} 15

External integrations

Integrate Aiven metrics and logs with external platforms:

Datadog integration

Send metrics to Datadog:
1

Create Datadog endpoint

Project → Integration endpointsDatadogEnter your Datadog API key and site (US/EU)
2

Enable for service

Service → IntegrationsDatadogEnable
3

View in Datadog

Metrics appear in Datadog with aiven. prefix
# Create Datadog integration via CLI
avn service integration-endpoint-create \
  --project my-project \
  --endpoint-name datadog-us \
  --endpoint-type datadog \
  --user-config '{"datadog_api_key": "<API_KEY>", "site": "datadoghq.com"}'

avn service integration-create \
  --project my-project \
  --source-service postgres-1 \
  --endpoint-id <ENDPOINT_ID> \
  --integration-type datadog

Jolokia (JMX) integration

Access JMX metrics for Kafka and other Java services:
1

Enable Jolokia

Available for Kafka, Kafka Connect, and other Java-based services
2

Get Jolokia URL

Service → Connection informationJolokia tab
3

Query JMX metrics

curl -u <USER>:<PASS> \
  https://<SERVICE_HOST>:<JOLOKIA_PORT>/jolokia/read/kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec

Rsyslog integration

Send logs to external syslog servers:
# Create rsyslog endpoint
avn service integration-endpoint-create \
  --project my-project \
  --endpoint-name syslog-server \
  --endpoint-type rsyslog \
  --user-config '{"server": "syslog.company.com", "port": 514, "format": "rfc5424"}'

# Enable for service  
avn service integration-create \
  --project my-project \
  --source-service postgres-1 \
  --endpoint-id <ENDPOINT_ID> \
  --integration-type rsyslog

Alerts and notifications

Set up alerts for service issues:

Email notifications

Manage project and service notifications:
1

Configure notification emails

Project settings → Notifications
2

Add email addresses

Add technical contacts for alerts
3

Select notification types

  • Service maintenance announcements
  • Service issues and outages
  • Backup failures
  • High resource usage warnings

Service contacts

Add contacts per service:
# Add service contact
avn service contact-create \
  --project my-project \
  --service postgres-1 \
  --email [email protected]

# List contacts  
avn service contact-list \
  --project my-project \
  --service postgres-1

Grafana alerts

Set up alerts in Grafana for metrics:
1

Create alert in Grafana

Open Grafana dashboard → Edit panel → Alert tab
2

Configure alert rule

  • Set threshold (e.g., CPU > 80%)
  • Set evaluation interval
  • Configure for clause (duration)
3

Add notification channel

  • Email, Slack, PagerDuty, etc.
  • Configure webhook URL or credentials
4

Test alert

Trigger test notification to verify setup

Common alert scenarios

Alert: CPU usage > 80% for 10 minutesActions:
  1. Review slow queries or processes
  2. Check for unusual traffic patterns
  3. Consider upgrading service plan
Alert: Disk usage > 85%Actions:
  1. Review disk usage by table/index
  2. Clean up unnecessary data
  3. Enable disk autoscaler
  4. Upgrade to larger plan
Alert: Connections > 80% of maxActions:
  1. Check for connection leaks in applications
  2. Implement connection pooling
  3. Review connection limits
  4. Upgrade service plan
Alert: Replication lag > 60 secondsActions:
  1. Check network connectivity
  2. Review write load on primary
  3. Check replica performance
  4. Contact support if persists

Monitoring best practices

1

Set up metrics integration early

Deploy PostgreSQL + Grafana for detailed metrics from day one
2

Configure log integration

Send logs to OpenSearch for longer retention and analysis
3

Create baseline metrics

Monitor normal performance patterns to identify anomalies
4

Set appropriate alert thresholds

Avoid alert fatigue - set thresholds that indicate real issues
5

Monitor key service metrics

Focus on:
  • Resource utilization (CPU, memory, disk)
  • Connection count and errors
  • Query performance and slow queries
  • Replication lag (if applicable)
6

Review logs regularly

Check error logs weekly to catch issues before they become critical
7

Use audit logs for security

Monitor audit logs for suspicious activity or unauthorized changes
8

Test alerting

Regularly verify alert notifications are being delivered
9

Document runbooks

Create procedures for responding to common alerts
10

Automate where possible

Use disk autoscaler and other automation to reduce manual intervention

Troubleshooting

Cause: Integration not properly configured or needs time to populateSolution:
  1. Verify metrics integration is active
  2. Wait 1-2 minutes for initial data
  3. Check PostgreSQL has space and is running
  4. Verify network connectivity if using VPC
Cause: Service in VPC without public Prometheus accessSolution:
  1. Enable public access: public_access.prometheus=true
  2. Or access from peered VPC
  3. Check IP allowlist includes your scraper’s IP
Cause: Integration not enabled or OpenSearch fullSolution:
  1. Verify log integration is active
  2. Check OpenSearch disk space
  3. Review index lifecycle management settings
  4. Check for ingestion errors in OpenSearch
Cause: Email addresses not configured or notifications disabledSolution:
  1. Verify email addresses in project notifications
  2. Check spam folder
  3. Verify notification types are enabled
  4. Test with manual service restart

API reference

curl -H "Authorization: Bearer $TOKEN" \
  https://api.aiven.io/v1/project/{project}/service/{service}/metrics

Next steps

Service Integrations

Set up metrics and log integrations

Security

Review security audit logs

VPC & Networking

Configure network for Prometheus access

Users & Permissions

Grant audit log access permissions

Build docs developers (and LLMs) love