DataStream is an immutable, distributed collection of data. You cannot inspect its elements directly or mutate them in place. Instead, you chain API operations that describe a computation graph, then trigger execution with env.execute().
Program structure
Every DataStream program follows the same five-step pattern:Obtain a StreamExecutionEnvironment
The Use
StreamExecutionEnvironment is the entry point for every Flink program. It holds configuration and creates data sources.getExecutionEnvironment() in almost all cases. It detects context automatically: it creates a local, single-JVM environment when you run from an IDE, and connects to a cluster when you submit a JAR with bin/flink run.Two other factory methods exist for specific cases:Load or create initial data
Attach a source to the environment to get a
DataStream. Flink provides built-in sources for collections, files, sockets, and sequences, plus a connector ecosystem for Kafka, Kinesis, and more.Apply transformations
Transformations produce new
DataStream instances from existing ones. They are lazy — nothing executes until you call execute().Write results to a sink
Sinks consume a
DataStream and write records to external systems. For production use, prefer sinkTo() with the Sink API over the older addSink().Complete example: streaming word count
The following program reads words from a socket, counts them in 5-second tumbling windows, and prints results to stdout.WindowWordCount.java
Key concepts
Lazy evaluation: Calling.map(), .filter(), or .keyBy() builds a dataflow graph in memory. No data moves until execute() is called. This lets Flink optimize the full graph before execution begins.
Parallelism: Each operator runs as one or more parallel instances. Set job-level parallelism with env.setParallelism(n) or per-operator with .map(...).setParallelism(n).
Operator chaining: Flink automatically chains adjacent operators that can share a thread (for example, consecutive map() calls) to reduce serialization overhead. You can disable chaining globally or per operator.
Buffer timeout: By default, Flink buffers records for up to 100 ms before flushing to downstream operators. Lower this for latency-sensitive pipelines:
Explore further
Execution Mode
Switch between STREAMING and BATCH execution modes for bounded data.
Data Sources
Built-in sources, the FLIP-27 Source API, and FileSource.
Operators
map, flatMap, filter, keyBy, window, connect, and more.
Data Sinks
FileSink, print, custom sinks, and exactly-once delivery.
Event Time & Watermarks
WatermarkStrategy, TimestampAssigner, and windowing with event time.
Fault Tolerance
Checkpointing, restart strategies, and exactly-once guarantees.
Working with State
ValueState, ListState, MapState, and state TTL.
Side Outputs
Route records to multiple output streams from a single operator.

