Skip to main content
Aiven for Apache Flink is a fully managed service for distributed, stateful stream processing. Process and analyze streaming data in real-time using standard SQL, with built-in integrations to Kafka and PostgreSQL.

Overview

Apache Flink is the leading open-source stream processing framework for building real-time data pipelines and streaming applications. Aiven for Apache Flink provides a managed platform with a built-in SQL editor, making it easy to develop, test, and deploy streaming applications without managing infrastructure.

SQL-Based Development

Write streaming applications using standard SQL with a built-in editor in Aiven Console

Stateful Processing

Maintain state across stream events for complex event processing and aggregations

Built-in Kafka Integration

Native integration with Aiven for Apache Kafka for seamless data flow

Exactly-Once Semantics

Guarantee data accuracy with exactly-once processing semantics

Key Features

Preview data without creating sink tables:
  • Test transformations quickly
  • Debug streaming logic
  • Explore data schemas
  • Validate joins and aggregations
-- Preview Kafka topic data
SELECT * FROM kafka_orders LIMIT 10;

-- Test transformation
SELECT 
    order_id,
    customer_id,
    order_total,
    order_total * 1.1 AS total_with_tax
FROM kafka_orders
WHERE order_status = 'completed'
LIMIT 100;
Apache Kafka Connector:
  • Auto-complete for Kafka topics
  • Multiple formats: JSON, Avro, Confluent Avro, Debezium CDC
  • Upsert Kafka for changelog streams
  • Schema Registry integration
PostgreSQL Connector:
  • Read from PostgreSQL tables
  • Write results back to PostgreSQL
  • Auto-complete for databases and tables
  • Support for JDBC connections
OpenSearch Connector:
  • Sink streaming results to OpenSearch
  • Full-text search integration
  • Dynamic index creation
Guarantee data accuracy:
  • Checkpointing for fault tolerance
  • Automatic state recovery
  • Transactional sinks
  • No data loss or duplication

Getting Started

1

Create Flink Service

Deploy an Apache Flink service:
avn service create my-flink \
  --service-type flink \
  --cloud aws-us-east-1 \
  --plan business-4
Service creation may be limited based on your subscription. Check with Aiven support for access.
2

Create Integration with Kafka

Connect Flink to your Kafka service:
avn service integration-create \
  --integration-type flink \
  --source-service my-kafka \
  --dest-service my-flink
This enables Flink to read from and write to Kafka topics.
3

Create a Flink Application

Use the Aiven Console wizard to:
  1. Create source tables from Kafka topics
  2. Write transformation SQL
  3. Create sink tables for results
  4. Deploy the application
4

Test with Interactive Queries

Run queries directly in the SQL editor to test before deploying.

Stream Processing Patterns

-- Filter high-value orders and enrich with customer data
CREATE TABLE enriched_orders AS
SELECT 
    o.order_id,
    o.order_time,
    o.order_total,
    c.customer_name,
    c.customer_tier,
    c.customer_email
FROM kafka_orders o
JOIN postgres_customers FOR SYSTEM_TIME AS OF o.order_time AS c
    ON o.customer_id = c.customer_id
WHERE o.order_total > 100;

Window Types

Fixed-size, non-overlapping windows:
-- Events grouped into 10-minute buckets
SELECT
    TUMBLE_START(event_time, INTERVAL '10' MINUTE) AS window_start,
    COUNT(*) AS event_count
FROM events
GROUP BY TUMBLE(event_time, INTERVAL '10' MINUTE);
Overlapping windows:
-- 10-minute windows sliding every 5 minutes
SELECT
    HOP_START(event_time, INTERVAL '5' MINUTE, INTERVAL '10' MINUTE) AS window_start,
    COUNT(*) AS event_count
FROM events
GROUP BY HOP(event_time, INTERVAL '5' MINUTE, INTERVAL '10' MINUTE);
Dynamic windows based on inactivity:
-- Group events with max 30-minute gap
SELECT
    SESSION_START(event_time, INTERVAL '30' MINUTE) AS session_start,
    SESSION_END(event_time, INTERVAL '30' MINUTE) AS session_end,
    user_id,
    COUNT(*) AS event_count
FROM events
GROUP BY 
    SESSION(event_time, INTERVAL '30' MINUTE),
    user_id;

Table Formats and Connectors

Kafka Table Formats

CREATE TABLE kafka_events (
    event_id STRING,
    event_time TIMESTAMP(3),
    user_id BIGINT,
    event_type STRING,
    WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
) WITH (
    'connector' = 'kafka',
    'topic' = 'events',
    'properties.bootstrap.servers' = 'kafka:9092',
    'format' = 'json',
    'json.timestamp-format.standard' = 'ISO-8601'
);

Cluster Management

  • Scale up: Increase CPU and memory per TaskManager
  • Scale out: Add more nodes to the cluster
  • Configure task slots per TaskManager
  • Adjust parallelism for jobs
Adjusting task slots requires a cluster restart.
Automatic fault tolerance:
  • Periodic checkpoints to object storage
  • State recovery on failure
  • Exactly-once guarantees
  • Configurable checkpoint interval
Checkpoints are automatically configured for your cluster.
Multiple jobs on same cluster:
  • Share cluster resources
  • Deploy multiple applications
  • Maximize resource utilization
  • Isolated job execution

Monitoring and Operations

Key Metrics

Job Metrics

  • Records processed per second
  • Job uptime and restarts
  • Checkpoint duration
  • Backpressure indicators

Resource Usage

  • TaskManager CPU/memory
  • JobManager status
  • Network I/O
  • State size

Integration with Observability

# Send logs to OpenSearch
avn service integration-create \
  --integration-type logs \
  --source-service my-flink \
  --dest-service my-opensearch

# Send metrics to Grafana
avn service integration-create \
  --integration-type metrics \
  --source-service my-flink \
  --dest-service my-grafana

Use Cases

  • Live dashboards
  • Streaming aggregations
  • Metric computation
  • KPI monitoring

Best Practices

  • Use proper key partitioning
  • Implement state TTL for growing state
  • Monitor state size
  • Use RocksDB for large state
  • Define watermarks for event-time processing
  • Account for late events
  • Balance latency vs completeness
  • Use allowed lateness for critical data
  • Tune checkpoint intervals
  • Adjust parallelism appropriately
  • Use proper join strategies
  • Monitor backpressure

Apache Kafka

Stream processing on Kafka data

PostgreSQL

Enrich streams with PostgreSQL data

OpenSearch

Sink processed results to OpenSearch

ClickHouse

Load streaming results to ClickHouse

Resources

SQL-Based Development: No Java or Scala knowledge required. Build streaming applications entirely with SQL using the Aiven Console.

Build docs developers (and LLMs) love