WITH clause.
How formats work
Formats implement either aDeserializationSchema (for reading), a SerializationSchema (for writing), or both. Flink discovers format factories via the same SPI mechanism as connectors: the format key in the WITH clause maps to a registered format factory.
Formats that support both reading and writing are labeled:
- Serialization Schema – can be used for sinks
- Deserialization Schema – can be used for sources
Available formats
| Format | Read | Write | Supported connectors |
|---|---|---|---|
| JSON | Yes | Yes | Kafka, Upsert Kafka, Kinesis, Firehose, Filesystem, Elasticsearch |
| Avro | Yes | Yes | Kafka, Upsert Kafka, Kinesis, Firehose, Filesystem |
| Parquet | Yes | Yes | Filesystem |
| ORC | Yes | Yes | Filesystem |
| CSV | Yes | Yes | Kafka, Upsert Kafka, Kinesis, Firehose, Filesystem |
| Confluent Avro | Yes | Yes | Kafka, Upsert Kafka |
| Protobuf | Yes | Yes | Kafka |
| Debezium CDC | Yes | Yes | Kafka, Filesystem |
| Canal CDC | Yes | Yes | Kafka, Filesystem |
| Maxwell CDC | Yes | Yes | Kafka, Filesystem |
| OGG CDC | Yes | Yes | Kafka, Filesystem |
| Raw | Yes | Yes | Kafka, Upsert Kafka, Kinesis, Firehose, Filesystem |
JSON
Human-readable format. Derive schema from table DDL. Configurable error handling and timestamp formats.
Avro
Compact binary format with schema evolution support. Schema is derived from the table schema.
Parquet
Columnar binary format. High compression and read performance. Compatible with Hive and Spark.
ORC
Columnar binary format. High compression and read performance. Compatible with Hive.
Formats and the DataStream API
Formats also integrate with the DataStream API throughFileSource and FileSink:
StreamFormat: reads records from an input stream one at a time (e.g.,TextLineInputFormat,CsvReaderFormat)BulkFormat: reads batches of records at a time (e.g., Parquet and ORC readers)BulkWriter.Factory: writes batches of records (e.g.,AvroWriterFactory,ParquetWriterFactory,OrcBulkWriterFactory)
CDC formats
Flink supports several Change Data Capture (CDC) formats that represent database change events as Flink rows with a changelog mode:| Format | Source system | Changelog mode |
|---|---|---|
| Debezium-JSON | MySQL, PostgreSQL, MongoDB, SQL Server, and others | Insert, Update, Delete |
| Canal-JSON | MySQL (via Alibaba Canal) | Insert, Update, Delete |
| Maxwell-JSON | MySQL (via Maxwell) | Insert, Update, Delete |
| OGG-JSON | Oracle GoldenGate | Insert, Update, Delete |
Resolving format dependency conflicts
If you package multiple formats in an uber-JAR, theirMETA-INF/services files may conflict. See the Table / SQL Connectors Overview for instructions on using Maven Shade’s ServicesResourceTransformer to merge them correctly.
