Monitoring and Logs

Aiven provides built-in monitoring and logging for all services, along with integrations to export metrics and logs to external platforms. Monitor service health, performance, and troubleshoot issues with comprehensive observability tools.

Monitoring overview

Aiven offers multiple levels of monitoring:

Built-in Metrics

Real-time service metrics
CPU, memory, disk, network
Service-specific metrics
Available in Console

Service Logs

Service operation logs
Error and debug messages
Connection logs
4-day retention

Audit Logs

Organization events
Project events
User actions
Configuration changes

Service metrics

View real-time metrics for all your services:

Built-in metrics

Available for every service without additional configuration:

Host Metrics
Service-Specific

Infrastructure-level metrics

CPU Usage

Percentage

Percentage of CPU resources consumed by the service

Memory Usage

Percentage

Percentage of memory utilized by the service

Disk Space Usage

Percentage

Percentage of disk space used

Load Average

Number

5-minute average CPU load indicating system computational load

Disk IOPS (reads)

Operations/sec

Input/output operations per second for disk reads

Disk IOPS (writes)

Operations/sec

Input/output operations per second for disk writes

Network Received

Bytes/sec

Network traffic received by the service

Network Transmitted

Bytes/sec

Network traffic transmitted by the service

Viewing metrics

Open service in Console

Navigate to Projects → Select project → Select service

View metrics tab

Click Metrics tab to see real-time charts

Adjust time range

Select time range: 1 hour, 6 hours, 1 day, 7 days, 30 days

Export via CLI

# Get metrics via CLI
avn service metrics \
  --project my-project \
  --service postgres-1 \
  --period hour

Advanced metrics integration

For detailed service-specific metrics, set up metrics integration:

Create PostgreSQL service

Store metrics data in PostgreSQL database

avn service create metrics-db \
  --project my-project \
  --service-type pg \
  --plan business-4 \
  --cloud aws-us-east-1

Create Grafana service

Visualize metrics with Grafana dashboards

avn service create metrics-grafana \
  --project my-project \
  --service-type grafana \
  --plan startup-4 \
  --cloud aws-us-east-1

Integrate services

# Connect Grafana to PostgreSQL
avn service integration-create \
  --project my-project \
  --source-service metrics-db \
  --dest-service metrics-grafana \
  --integration-type dashboard

# Send service metrics to PostgreSQL  
avn service integration-create \
  --project my-project \
  --source-service postgres-1 \
  --dest-service metrics-db \
  --integration-type metrics

View dashboards

Access Grafana using the service URI from Console connection information

Metrics integration requires separate PostgreSQL and Grafana services (additional cost). Predefined dashboards are automatically created and maintained.

Service logs

View logs for troubleshooting and monitoring:

Accessing service logs

Aiven Console
Aiven CLI

Navigate to service

Select project → Select service

View logs tab

Click Logs tab

Filter logs

Filter by severity (error, warning, info)
Search for specific text
Adjust time range

# View recent logs
avn service logs \
  --project my-project \
  --service postgres-1

# Follow logs in real-time
avn service logs \
  --project my-project \
  --service postgres-1 \
  --follow

# Filter by severity
avn service logs \
  --project my-project \
  --service postgres-1 \
  --severity error

Log retention

Service logs are retained for 4 days by default. For longer retention, set up log integration with OpenSearch.

Default Log Retention:
  Service Logs: 4 days
  Organization Audit Logs: 90 days
  Project Event Logs: 90 days
  
Extended Retention:
  Method: Log integration with OpenSearch
  Duration: Limited by OpenSearch disk space
  Cost: OpenSearch service cost

Log integration

Send logs to OpenSearch for long-term storage and analysis:

Create OpenSearch service

avn service create logs-opensearch \
  --project my-project \
  --service-type opensearch \
  --plan business-8 \
  --cloud aws-us-east-1

Enable log integration

avn service integration-create \
  --project my-project \
  --source-service postgres-1 \
  --dest-service logs-opensearch \
  --integration-type logs

Access OpenSearch Dashboards

Use service URI to open OpenSearch Dashboards and analyze logs

Configure retention

Set index lifecycle management in OpenSearch for desired retention period

All services in the project can send logs to the same OpenSearch service. Create one OpenSearch service for centralized logging.

Audit logs

Track administrative actions and changes:

Organization audit logs

View organization-level events:

Navigate to organization logs

Admin → Organization → Events Log

Review events

Events include:

User invitations and removals
Permission changes
Domain verification
IdP configuration
Billing group changes
Organizational unit creation/deletion

Project event logs

View project-level events:

Navigate to project logs

Select project → Event log

Review events

Events include:

Service creation/deletion/updates
Configuration changes
User access changes
VPC and peering changes
Integration changes
Backup operations

API access to audit logs

# Get organization audit logs
curl -H "Authorization: Bearer $TOKEN" \
  https://api.aiven.io/v1/organization/{org_id}/audit

# Get project event logs  
curl -H "Authorization: Bearer $TOKEN" \
  https://api.aiven.io/v1/project/{project}/events

# Filter by date range
curl -H "Authorization: Bearer $TOKEN" \
  "https://api.aiven.io/v1/project/{project}/events?from=2024-01-01&to=2024-01-31"

Organization audit logs require organization:audit_logs:read permission. Project logs require project:audit_logs:read permission.

Prometheus integration

Expose metrics in Prometheus format for scraping:

Enabling Prometheus

Create Prometheus endpoint

Navigate to project → Integration endpoints → Add new endpoint → Prometheus

Enable for service

Service → Overview → Service integrations → Manage integrations → Prometheus → Enable

Get metrics endpoint

Service → Overview → Connection information → Prometheus tabCopy the Service URI and credentials

Configure Prometheus scraper

# prometheus.yml
scrape_configs:
  - job_name: 'aiven-postgres'
    scheme: https
    basic_auth:
      username: <PROMETHEUS_USER>
      password: <PROMETHEUS_PASSWORD>
    tls_config:
      ca_file: ca.pem
    static_configs:
      - targets: ['<SERVICE_HOST>:<PROMETHEUS_PORT>']

Prometheus in VPC

If using VPC peering, enable public Prometheus access:

# Enable public access to Prometheus endpoint
avn service update \
  --project my-project \
  --service postgres-1 \
  -c public_access.prometheus=true

Prometheus metrics

Available metrics include:

System metrics: CPU, memory, disk, network
Service metrics: Connections, queries, cache hits
Custom metrics: Application-specific (service dependent)

# Example metrics query
curl -u <USER>:<PASS> https://<SERVICE_HOST>:<PORT>/metrics

# Output:
# cpu_usage_percent{host="node1"} 45.2
# memory_usage_percent{host="node1"} 67.8
# pg_stat_database_numbackends{datname="defaultdb"} 15

External integrations

Integrate Aiven metrics and logs with external platforms:

Datadog integration

Send metrics to Datadog:

Create Datadog endpoint

Project → Integration endpoints → DatadogEnter your Datadog API key and site (US/EU)

Enable for service

Service → Integrations → Datadog → Enable

View in Datadog

Metrics appear in Datadog with aiven. prefix

# Create Datadog integration via CLI
avn service integration-endpoint-create \
  --project my-project \
  --endpoint-name datadog-us \
  --endpoint-type datadog \
  --user-config '{"datadog_api_key": "<API_KEY>", "site": "datadoghq.com"}'

avn service integration-create \
  --project my-project \
  --source-service postgres-1 \
  --endpoint-id <ENDPOINT_ID> \
  --integration-type datadog

Jolokia (JMX) integration

Access JMX metrics for Kafka and other Java services:

Enable Jolokia

Available for Kafka, Kafka Connect, and other Java-based services

Get Jolokia URL

Service → Connection information → Jolokia tab

Query JMX metrics

curl -u <USER>:<PASS> \
  https://<SERVICE_HOST>:<JOLOKIA_PORT>/jolokia/read/kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec

Rsyslog integration

Send logs to external syslog servers:

# Create rsyslog endpoint
avn service integration-endpoint-create \
  --project my-project \
  --endpoint-name syslog-server \
  --endpoint-type rsyslog \
  --user-config '{"server": "syslog.company.com", "port": 514, "format": "rfc5424"}'

# Enable for service  
avn service integration-create \
  --project my-project \
  --source-service postgres-1 \
  --endpoint-id <ENDPOINT_ID> \
  --integration-type rsyslog

Alerts and notifications

Set up alerts for service issues:

Email notifications

Manage project and service notifications:

Configure notification emails

Project settings → Notifications

Add email addresses

Add technical contacts for alerts

Select notification types

Service maintenance announcements
Service issues and outages
Backup failures
High resource usage warnings

Service contacts

Add contacts per service:

# Add service contact
avn service contact-create \
  --project my-project \
  --service postgres-1 \
  --email [email protected]

# List contacts  
avn service contact-list \
  --project my-project \
  --service postgres-1

Grafana alerts

Set up alerts in Grafana for metrics:

Create alert in Grafana

Open Grafana dashboard → Edit panel → Alert tab

Configure alert rule

Set threshold (e.g., CPU > 80%)
Set evaluation interval
Configure for clause (duration)

Add notification channel

Email, Slack, PagerDuty, etc.
Configure webhook URL or credentials

Test alert

Trigger test notification to verify setup

Common alert scenarios

High CPU usage

Alert: CPU usage > 80% for 10 minutesActions:

Review slow queries or processes
Check for unusual traffic patterns
Consider upgrading service plan

Low disk space

Alert: Disk usage > 85%Actions:

Review disk usage by table/index
Clean up unnecessary data
Enable disk autoscaler
Upgrade to larger plan

High connection count

Alert: Connections > 80% of maxActions:

Check for connection leaks in applications
Implement connection pooling
Review connection limits
Upgrade service plan

Replication lag

Alert: Replication lag > 60 secondsActions:

Check network connectivity
Review write load on primary
Check replica performance
Contact support if persists

Monitoring best practices

Set up metrics integration early

Deploy PostgreSQL + Grafana for detailed metrics from day one

Configure log integration

Send logs to OpenSearch for longer retention and analysis

Create baseline metrics

Monitor normal performance patterns to identify anomalies

Set appropriate alert thresholds

Avoid alert fatigue - set thresholds that indicate real issues

Monitor key service metrics

Focus on:

Resource utilization (CPU, memory, disk)
Connection count and errors
Query performance and slow queries
Replication lag (if applicable)

Review logs regularly

Check error logs weekly to catch issues before they become critical

Use audit logs for security

Monitor audit logs for suspicious activity or unauthorized changes

Test alerting

Regularly verify alert notifications are being delivered

Document runbooks

Create procedures for responding to common alerts

Automate where possible

Use disk autoscaler and other automation to reduce manual intervention

Troubleshooting

Metrics not appearing in Grafana

Cause: Integration not properly configured or needs time to populateSolution:

Verify metrics integration is active
Wait 1-2 minutes for initial data
Check PostgreSQL has space and is running
Verify network connectivity if using VPC

Cannot access Prometheus endpoint

Cause: Service in VPC without public Prometheus accessSolution:

Enable public access: public_access.prometheus=true
Or access from peered VPC
Check IP allowlist includes your scraper’s IP

Logs not appearing in OpenSearch

Cause: Integration not enabled or OpenSearch fullSolution:

Verify log integration is active
Check OpenSearch disk space
Review index lifecycle management settings
Check for ingestion errors in OpenSearch

Not receiving alert emails

Cause: Email addresses not configured or notifications disabledSolution:

Verify email addresses in project notifications
Check spam folder
Verify notification types are enabled
Test with manual service restart

API reference

curl -H "Authorization: Bearer $TOKEN" \
  https://api.aiven.io/v1/project/{project}/service/{service}/metrics

Next steps

Service Integrations

Set up metrics and log integrations

Security

Review security audit logs

VPC & Networking

Configure network for Prometheus access

Users & Permissions

Grant audit log access permissions

Get Started

Platform

Services

Developer Tools

Integrations

​Monitoring overview

Built-in Metrics

Service Logs

Audit Logs

​Service metrics

​Built-in metrics

​Viewing metrics

​Advanced metrics integration

​Service logs

​Accessing service logs

​Log retention

​Log integration

​Audit logs

​Organization audit logs

​Project event logs

​API access to audit logs

​Prometheus integration

​Enabling Prometheus

​Prometheus in VPC

​Prometheus metrics

​External integrations

​Datadog integration

​Jolokia (JMX) integration

​Rsyslog integration

​Alerts and notifications

​Email notifications

​Service contacts

​Grafana alerts

​Common alert scenarios

​Monitoring best practices

​Troubleshooting

​API reference

​Next steps

Service Integrations

Security

VPC & Networking

Users & Permissions

Build docs developers (and LLMs) love

Monitoring overview

Service metrics

Built-in metrics

Viewing metrics

Advanced metrics integration

Service logs

Accessing service logs

Log retention

Log integration

Audit logs

Organization audit logs

Project event logs

API access to audit logs

Prometheus integration

Enabling Prometheus

Prometheus in VPC

Prometheus metrics

External integrations

Datadog integration

Jolokia (JMX) integration

Rsyslog integration

Alerts and notifications

Email notifications

Service contacts

Grafana alerts

Common alert scenarios

Monitoring best practices

Troubleshooting

API reference

Next steps