Source and sink model
Every connector implements either theSource interface, the Sink interface, or both. You add them to a job with env.fromSource(...) and stream.sinkTo(...) respectively.
Delivery guarantees
Flink connectors provide one of three processing guarantees:| Guarantee | Description |
|---|---|
| At-most-once | Records may be lost on failure. No re-delivery. |
| At-least-once | Records are never lost but may be delivered more than once. The source re-reads from the last checkpoint position. |
| Exactly-once | Every record is processed exactly once end-to-end. Requires the source to participate in checkpointing and the sink to use transactions or idempotent writes. |
Source guarantees
| Source | Guarantee | Notes |
|---|---|---|
| Apache Kafka | Exactly-once | Offsets committed on checkpoint |
| AWS Kinesis Streams | Exactly-once | |
| RabbitMQ | At-most-once (v0.10) / Exactly-once (v1.0) | |
| Google PubSub | At-least-once | |
| Files (FileSource) | Exactly-once | |
| Collections | Exactly-once | |
| Sockets | At-most-once |
Sink guarantees
| Sink | Guarantee | Notes |
|---|---|---|
| Apache Kafka | At-least-once / Exactly-once | Exactly-once with transactional producers (v0.11+) |
| Elasticsearch / Opensearch | At-least-once | |
| Cassandra | At-least-once / Exactly-once | Exactly-once for idempotent updates only |
| Amazon DynamoDB | At-least-once | |
| Amazon Kinesis Data Streams | At-least-once | |
| Amazon Kinesis Data Firehose | At-least-once | |
| File sinks (FileSink) | Exactly-once | Requires checkpointing in streaming mode |
| Socket sinks | At-least-once |
Built-in connectors
The following connectors are included in theflink-connectors module and ship with Flink source releases. They are not included in the binary distribution JARs—you must add the relevant dependency to your project.
Filesystem
Read and write files on POSIX, HDFS, S3, OSS, and other supported filesystems. Supports CSV, Avro, Parquet, ORC, and custom formats.
DataGen
Generate synthetic data streams for testing and development without an external system.
Hybrid Source
Chain multiple sources in sequence. Read bounded historical data then switch automatically to an unbounded real-time source.
External connectors
The following connectors are maintained in separate repositories outside the main Flink repository. Add the appropriate Maven dependency for each.| Connector | Direction | Repository |
|---|---|---|
| Apache Kafka | Source / Sink | apache/flink-connector-kafka |
| Apache Cassandra | Source / Sink | apache/flink-connector-cassandra |
| Amazon DynamoDB | Sink | apache/flink-connector-aws |
| Amazon Kinesis Data Streams | Source / Sink | apache/flink-connector-aws |
| Amazon Kinesis Data Firehose | Sink | apache/flink-connector-aws |
| Elasticsearch | Sink | apache/flink-connector-elasticsearch |
| Opensearch | Sink | apache/flink-connector-opensearch |
| RabbitMQ | Source / Sink | apache/flink-connector-rabbitmq |
| Google PubSub | Source / Sink | apache/flink-connector-gcp-pubsub |
| Apache Pulsar | Source | apache/flink-connector-pulsar |
| JDBC | Sink | apache/flink-connector-jdbc |
| MongoDB | Source / Sink | apache/flink-connector-mongodb |
| Prometheus | Sink | apache/flink-connector-prometheus |
Connectors in Apache Bahir
Apache Bahir hosts additional streaming connectors that are released independently:- Apache ActiveMQ (source/sink)
- Apache Flume (sink)
- Redis (sink)
- Akka (sink)
- Netty (source)
Other ways to connect to Flink
Async I/O for data enrichment
Instead of using a connector, you can query an external database or web service inside aMap or FlatMap operator. Flink’s Asynchronous I/O API lets you do this efficiently by overlapping multiple outstanding requests:
Because
flink-connector-base is bundled in flink-dist, externalized connectors no longer bundle it as a transitive dependency. If you run examples locally outside of a Flink cluster, make sure flink-connector-base is present on your classpath.
