Aiven for Metrics

Aiven for Metrics, powered by Thanos, simplifies the management and analysis of large volumes of metrics data. This fully managed service provides scalable, reliable, and efficient metrics collection, storage, and querying suitable for organizations of all sizes.

Overview

Aiven for Metrics is built on Thanos, an open-source project that extends Prometheus with unlimited storage capabilities and global query views across multiple Prometheus instances. Store unlimited metrics for any duration with cost-effective object storage.

Why Choose Aiven for Metrics

Unlimited Retention

Store metrics data for as long as needed with scalable object storage

Prometheus Compatible

Use existing Prometheus exporters, queries (PromQL), and tools like Grafana

Global Query View

Query metrics from multiple Prometheus servers through unified interface

Cost-Effective

Downsampling and compaction reduce storage costs while improving query performance

Key Components

Aiven for Metrics includes several Thanos components working together:

Thanos Metrics Receiver

Ingests metrics into the system:

Accepts Prometheus remote write requests
Real-time metrics collection
High-throughput ingestion
Automatic scaling
Data validation

Thanos Metrics Query

Query interface for metrics:

PromQL query support
Aggregates data from multiple sources
Real-time and historical data
Deduplication of samples
Compatible with Grafana

Thanos Metrics Store

Long-term storage interface:

Interfaces with object storage
Historical data access
Efficient data retrieval
Scalable storage
Automatic data management

Thanos Metrics Compact

Storage optimization:

Data compaction
Downsampling for efficiency
Reduces storage costs
Improves query performance
Background processing

Thanos Query Frontend

Query optimization layer:

Caches query results
Splits large queries
Load distribution
Improved performance
Reduced latency

Getting Started

Create Metrics Service

Deploy an Aiven for Metrics service:

avn service create my-metrics \
  --service-type thanos \
  --cloud aws-us-east-1 \
  --plan startup-4

Configure Prometheus Remote Write

Point your Prometheus instances to Aiven for Metrics:Get the remote write URL:

avn service get my-metrics --format '{service_uri}'

Configure Prometheus:

# prometheus.yml
remote_write:
  - url: https://thanos-service.aivencloud.com:443/api/v1/receive
    basic_auth:
      username: avnadmin
      password: your-password
    queue_config:
      max_samples_per_send: 1000
      batch_send_deadline: 5s
      max_shards: 200

Integrate with Grafana

Connect Grafana to query metrics:

avn service integration-create \
  --integration-type metrics \
  --source-service my-metrics \
  --dest-service my-grafana

Or add manually in Grafana:

Type: Prometheus
URL: https://thanos-service.aivencloud.com:443
Auth: Basic auth with service credentials

Query Metrics

Use PromQL to query your metrics in Grafana or directly via API.

Architecture and Data Flow

Data Collection

Prometheus instances send metrics via remote write to Thanos Metrics Receiver

Real-Time Storage

Receivers store metrics in Time Series Database (TSDB) blocks

Long-Term Storage

After 2 hours, TSDB blocks are uploaded to object storage

Query Processing

Query Frontend receives requests
Distributes to Thanos Query
Query fetches from Receivers (recent data) and Store (historical data)
Deduplicates samples from multiple sources
Returns results with caching

Data Optimization

Thanos Compact continuously:

Compacts small blocks into larger ones
Downsamples old data (5m, 1h resolutions)
Reduces storage costs
Improves query performance

Query Examples

Basic Queries
Aggregations
Time-Based Queries
Alerts

# Current CPU usage
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage percentage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

# HTTP request rate
rate(http_requests_total[5m])

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# Total requests per service
sum by (service) (rate(http_requests_total[5m]))

# Average response time
avg by (endpoint) (http_request_duration_seconds)

# 95th percentile latency
histogram_quantile(0.95,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)

# Top 5 endpoints by request count
topk(5, sum by (endpoint) (rate(http_requests_total[5m])))

# Day-over-day comparison
rate(requests_total[5m]) / rate(requests_total[5m] offset 24h)

# Week-over-week growth
(sum(rate(requests_total[7d])) - sum(rate(requests_total[7d] offset 7d)))
/ sum(rate(requests_total[7d] offset 7d)) * 100

# Query historical data (6 months ago)
avg_over_time(cpu_usage[1h] offset 4320h)

# High error rate alert
rate(http_requests_total{status=~"5.."}[5m]) > 0.05

# High memory usage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/ node_memory_MemTotal_bytes > 0.90

# Service down
up{job="my-service"} == 0

# Disk space low
(node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.10

Benefits of Aiven for Metrics

Centralized Monitoring

Query and analyze metrics from multiple Prometheus servers and clusters through unified view

Unlimited Retention

Store unlimited metric data for any duration with scalable object storage

Cost-Effective

Downsampling and compacting reduces storage needs and costs while improving query performance

Simplified Operations

Pre-configured Thanos setup eliminates complexity of managing metrics infrastructure

High Availability

Distributed architecture ensures metrics availability and query reliability

Grafana Compatible

Seamlessly integrate with Grafana for visualization and dashboards

Downsampling

Automatic downsampling reduces storage and improves query performance:

Raw Data
5-Minute Resolution
1-Hour Resolution

Retention: Recent data

Original resolution (15s, 30s, 1m)
Used for recent time ranges
Highest accuracy
Larger storage footprint

Use Cases

Multi-Cluster Monitoring
Long-Term Storage
Multi-Region Monitoring
Cost Optimization

Monitor metrics across multiple Kubernetes clusters:

Central metrics aggregation
Cross-cluster queries
Unified alerting
Global service health

Configuration Examples

Prometheus Remote Write

# prometheus.yml
global:
  external_labels:
    cluster: 'production-us-east'
    environment: 'production'

remote_write:
  - url: https://thanos-service.aivencloud.com/api/v1/receive
    basic_auth:
      username: avnadmin
      password: ${THANOS_PASSWORD}
    queue_config:
      capacity: 10000
      max_samples_per_send: 5000
      batch_send_deadline: 5s
      max_shards: 200
      min_shards: 1
      max_retries: 3
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'expensive_metric_.*'
        action: drop

Grafana Data Source

apiVersion: 1
datasources:
  - name: Aiven-Metrics
    type: prometheus
    access: proxy
    url: https://thanos-service.aivencloud.com:443
    basicAuth: true
    basicAuthUser: avnadmin
    secureJsonData:
      basicAuthPassword: ${THANOS_PASSWORD}
    jsonData:
      timeInterval: 30s
      queryTimeout: 60s
      httpMethod: POST

Limitations

No Direct Thanos Access: All access must go through Aiven service integrations
Cloud Availability: Not currently available on Azure or Google Cloud Marketplace
Query Limits: Very large time ranges may have query timeouts

Best Practices

Metrics Retention

Configure appropriate retention in Prometheus
Use external labels for multi-cluster identification
Clean up unused metrics regularly
Monitor ingestion rate

Query Optimization

Use appropriate time ranges
Leverage downsampled data for long ranges
Use recording rules for expensive queries
Add filters early in queries

Cost Management

Remove unused metrics at source
Use relabel configs to drop metrics
Monitor storage growth
Leverage downsampling

Monitoring

Key Metrics to Track

Ingestion Rate: Samples per second
Query Latency: P50, P95, P99 query times
Storage Usage: Object storage consumption
TSDB Blocks: Number and size of blocks
Query Cache: Hit rate and efficiency

Integration with Grafana

# Create integration
avn service integration-create \
  --integration-type metrics \
  --source-service my-metrics \
  --dest-service my-grafana

# Monitor your metrics service
# Use pre-built Thanos dashboards in Grafana

Grafana

Visualize metrics with dashboards

Apache Kafka

Monitor Kafka metrics with Prometheus

PostgreSQL

Track database metrics over time

ClickHouse

Store metrics in ClickHouse for analysis

Resources

Prometheus Compatibility: Aiven for Metrics is fully compatible with Prometheus, allowing you to use existing exporters, queries, and tools like Grafana seamlessly.

Get Started

Platform

Services

Developer Tools

Integrations

Aiven for Metrics

Overview

Why Choose Aiven for Metrics

Unlimited Retention

Prometheus Compatible

Global Query View

Cost-Effective

Key Components

Getting Started

Architecture and Data Flow

Query Examples

Benefits of Aiven for Metrics

Centralized Monitoring

Unlimited Retention

Cost-Effective

Simplified Operations

High Availability

Grafana Compatible

Downsampling

Use Cases

Configuration Examples

Prometheus Remote Write

Grafana Data Source

Limitations

Best Practices

Monitoring

Key Metrics to Track

Integration with Grafana

Grafana

Apache Kafka

PostgreSQL

ClickHouse

Resources

Build docs developers (and LLMs) love

Get Started

Platform

Services

Developer Tools

Integrations

​Overview

​Why Choose Aiven for Metrics

Unlimited Retention

Prometheus Compatible

Global Query View

Cost-Effective

​Key Components

​Getting Started

​Architecture and Data Flow

​Query Examples

​Benefits of Aiven for Metrics

Centralized Monitoring

Unlimited Retention

Cost-Effective

Simplified Operations

High Availability

Grafana Compatible

​Downsampling

​Use Cases

​Configuration Examples

​Prometheus Remote Write

​Grafana Data Source

​Limitations

​Best Practices

​Monitoring

​Key Metrics to Track

​Integration with Grafana

​Related Services

Grafana

Apache Kafka

PostgreSQL

ClickHouse

​Resources

Build docs developers (and LLMs) love

Overview

Why Choose Aiven for Metrics

Key Components

Getting Started

Architecture and Data Flow

Query Examples

Benefits of Aiven for Metrics

Downsampling

Use Cases

Configuration Examples

Prometheus Remote Write

Grafana Data Source

Limitations

Best Practices

Monitoring

Key Metrics to Track

Integration with Grafana

Related Services

Resources