Skip to main content
Aiven for Metrics, powered by Thanos, simplifies the management and analysis of large volumes of metrics data. This fully managed service provides scalable, reliable, and efficient metrics collection, storage, and querying suitable for organizations of all sizes.

Overview

Aiven for Metrics is built on Thanos, an open-source project that extends Prometheus with unlimited storage capabilities and global query views across multiple Prometheus instances. Store unlimited metrics for any duration with cost-effective object storage.

Why Choose Aiven for Metrics

Unlimited Retention

Store metrics data for as long as needed with scalable object storage

Prometheus Compatible

Use existing Prometheus exporters, queries (PromQL), and tools like Grafana

Global Query View

Query metrics from multiple Prometheus servers through unified interface

Cost-Effective

Downsampling and compaction reduce storage costs while improving query performance

Key Components

Aiven for Metrics includes several Thanos components working together:
Ingests metrics into the system:
  • Accepts Prometheus remote write requests
  • Real-time metrics collection
  • High-throughput ingestion
  • Automatic scaling
  • Data validation
Query interface for metrics:
  • PromQL query support
  • Aggregates data from multiple sources
  • Real-time and historical data
  • Deduplication of samples
  • Compatible with Grafana
Long-term storage interface:
  • Interfaces with object storage
  • Historical data access
  • Efficient data retrieval
  • Scalable storage
  • Automatic data management
Storage optimization:
  • Data compaction
  • Downsampling for efficiency
  • Reduces storage costs
  • Improves query performance
  • Background processing
Query optimization layer:
  • Caches query results
  • Splits large queries
  • Load distribution
  • Improved performance
  • Reduced latency

Getting Started

1

Create Metrics Service

Deploy an Aiven for Metrics service:
avn service create my-metrics \
  --service-type thanos \
  --cloud aws-us-east-1 \
  --plan startup-4
2

Configure Prometheus Remote Write

Point your Prometheus instances to Aiven for Metrics:Get the remote write URL:
avn service get my-metrics --format '{service_uri}'
Configure Prometheus:
# prometheus.yml
remote_write:
  - url: https://thanos-service.aivencloud.com:443/api/v1/receive
    basic_auth:
      username: avnadmin
      password: your-password
    queue_config:
      max_samples_per_send: 1000
      batch_send_deadline: 5s
      max_shards: 200
3

Integrate with Grafana

Connect Grafana to query metrics:
avn service integration-create \
  --integration-type metrics \
  --source-service my-metrics \
  --dest-service my-grafana
Or add manually in Grafana:
  • Type: Prometheus
  • URL: https://thanos-service.aivencloud.com:443
  • Auth: Basic auth with service credentials
4

Query Metrics

Use PromQL to query your metrics in Grafana or directly via API.

Architecture and Data Flow

1

Data Collection

Prometheus instances send metrics via remote write to Thanos Metrics Receiver
2

Real-Time Storage

Receivers store metrics in Time Series Database (TSDB) blocks
3

Long-Term Storage

After 2 hours, TSDB blocks are uploaded to object storage
4

Query Processing

  • Query Frontend receives requests
  • Distributes to Thanos Query
  • Query fetches from Receivers (recent data) and Store (historical data)
  • Deduplicates samples from multiple sources
  • Returns results with caching
5

Data Optimization

Thanos Compact continuously:
  • Compacts small blocks into larger ones
  • Downsamples old data (5m, 1h resolutions)
  • Reduces storage costs
  • Improves query performance

Query Examples

# Current CPU usage
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage percentage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

# HTTP request rate
rate(http_requests_total[5m])

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

Benefits of Aiven for Metrics

Centralized Monitoring

Query and analyze metrics from multiple Prometheus servers and clusters through unified view

Unlimited Retention

Store unlimited metric data for any duration with scalable object storage

Cost-Effective

Downsampling and compacting reduces storage needs and costs while improving query performance

Simplified Operations

Pre-configured Thanos setup eliminates complexity of managing metrics infrastructure

High Availability

Distributed architecture ensures metrics availability and query reliability

Grafana Compatible

Seamlessly integrate with Grafana for visualization and dashboards

Downsampling

Automatic downsampling reduces storage and improves query performance:
Retention: Recent data
  • Original resolution (15s, 30s, 1m)
  • Used for recent time ranges
  • Highest accuracy
  • Larger storage footprint

Use Cases

Monitor metrics across multiple Kubernetes clusters:
  • Central metrics aggregation
  • Cross-cluster queries
  • Unified alerting
  • Global service health

Configuration Examples

Prometheus Remote Write

# prometheus.yml
global:
  external_labels:
    cluster: 'production-us-east'
    environment: 'production'

remote_write:
  - url: https://thanos-service.aivencloud.com/api/v1/receive
    basic_auth:
      username: avnadmin
      password: ${THANOS_PASSWORD}
    queue_config:
      capacity: 10000
      max_samples_per_send: 5000
      batch_send_deadline: 5s
      max_shards: 200
      min_shards: 1
      max_retries: 3
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'expensive_metric_.*'
        action: drop

Grafana Data Source

apiVersion: 1
datasources:
  - name: Aiven-Metrics
    type: prometheus
    access: proxy
    url: https://thanos-service.aivencloud.com:443
    basicAuth: true
    basicAuthUser: avnadmin
    secureJsonData:
      basicAuthPassword: ${THANOS_PASSWORD}
    jsonData:
      timeInterval: 30s
      queryTimeout: 60s
      httpMethod: POST

Limitations

  • No Direct Thanos Access: All access must go through Aiven service integrations
  • Cloud Availability: Not currently available on Azure or Google Cloud Marketplace
  • Query Limits: Very large time ranges may have query timeouts

Best Practices

  • Configure appropriate retention in Prometheus
  • Use external labels for multi-cluster identification
  • Clean up unused metrics regularly
  • Monitor ingestion rate
  • Use appropriate time ranges
  • Leverage downsampled data for long ranges
  • Use recording rules for expensive queries
  • Add filters early in queries
  • Remove unused metrics at source
  • Use relabel configs to drop metrics
  • Monitor storage growth
  • Leverage downsampling

Monitoring

Key Metrics to Track

  • Ingestion Rate: Samples per second
  • Query Latency: P50, P95, P99 query times
  • Storage Usage: Object storage consumption
  • TSDB Blocks: Number and size of blocks
  • Query Cache: Hit rate and efficiency

Integration with Grafana

# Create integration
avn service integration-create \
  --integration-type metrics \
  --source-service my-metrics \
  --dest-service my-grafana

# Monitor your metrics service
# Use pre-built Thanos dashboards in Grafana

Grafana

Visualize metrics with dashboards

Apache Kafka

Monitor Kafka metrics with Prometheus

PostgreSQL

Track database metrics over time

ClickHouse

Store metrics in ClickHouse for analysis

Resources

Prometheus Compatibility: Aiven for Metrics is fully compatible with Prometheus, allowing you to use existing exporters, queries, and tools like Grafana seamlessly.

Build docs developers (and LLMs) love