What you’ll build
In this tutorial you will build a fraud detection system that processes a stream of credit card transactions and emits an alert whenever it detects a suspicious pattern: a small transaction (under 500.00) on the same account.- Set up a
StreamExecutionEnvironment - Partition a stream by key with
keyBy - Implement stateful logic with
KeyedProcessFunction - Use
ValueStatefor per-key fault-tolerant state - Run a Flink job from an IDE or the command line
Prerequisites
- Java
- Python
- Java 11, 17, or 21
- Maven 3.8.6 or later
- An IDE (IntelliJ IDEA recommended)
Steps
Create the project
- Java (Maven archetype)
- Python
Generate a skeleton project using the Flink Maven archetype:Maven creates a IntelliJ IDEA note: If you see
frauddetection/ directory containing FraudDetectionJob.java, FraudDetector.java, and pre-configured pom.xml with the flink-streaming-java and flink-walkthrough-common dependencies.If you are running the latest SNAPSHOT version of Flink, use the corresponding snapshot version number in
-DarchetypeVersion. The archetype is only published for released versions.java.lang.NoClassDefFoundError when running in the IDE, go to Run → Edit Configurations → Modify options and select Include dependencies with ‘Provided’ scope.Set up the execution environment
The
StreamExecutionEnvironment is the entry point for every DataStream application. It lets you configure execution properties and create sources.- Java
- Python
FraudDetectionJob.java
Create a source
Sources ingest data from external systems into Flink. The walkthrough uses generated transaction data so you can run the example without any external infrastructure.
- Java
- Python
The archetype includes a
TransactionSource backed by a DataGeneratorSource that produces an infinite stream of credit card transactions. Each Transaction has an accountId, timestamp, and amount.FraudDetectionJob.java
fromSource takes three arguments: the source connector, a watermark strategy (this job uses processing time so watermarks are not needed), and a display name used in the Web UI.Partition by key and apply the fraud detector
To detect fraud per account, all transactions for the same
accountId must be routed to the same operator instance. keyBy achieves this by partitioning the stream. The process() call applies your FraudDetector function in a keyed context, meaning each invocation sees state scoped to the current account.- Java
- Python
FraudDetectionJob.java
Write results to a sink
A sink writes results to an external system. For this tutorial, print the alerts to the console.
- Java
- Python
FraudDetectionJob.java
AlertSink (provided by the archetype) logs each Alert at INFO level. env.execute() triggers actual job execution — without this call the job graph is built but never run.Implement the FraudDetector
The
FraudDetector extends KeyedProcessFunction, which provides per-key state and timer access. The business rule: flag an account when a small transaction (< 500.00).The flag must be stored in managed state, not a plain Java field, because Flink processes transactions from many accounts with a single FraudDetector instance. A plain field would be shared across all keys.- Java
- Python
FraudDetector.java
ValueState is registered in open(), which runs once before processing begins. ValueState is always scoped to the current key (account ID), so each account has its own independent flag. The three ValueState methods are:value()— read current value, returnsnullif not setupdate(value)— write a new valueclear()— delete the value for the current key
Complete programs
What’s next
- Add timers — The current detector does not enforce a time constraint between the two transactions. See Event-Driven Applications to learn how to add a 1-minute timer so the fraud pattern expires.
- DataStream API reference — Explore all operators, state backends, and connector APIs in the DataStream API documentation.
- Table API Quickstart — Try the higher-level Table API for aggregation pipelines.
- SQL Quickstart — Query streaming data interactively in the SQL Client.

