StreamMetrics
Average latency in milliseconds per row processed
Number of rows processed per second
stream_dataframe
The DataFrame to stream in chunks
Number of rows per chunk
Generator yielding DataFrame chunks of size
chunk_sizeprocess_stream
The DataFrame to process
Number of rows per chunk for streaming
Function to apply to each chunk. Should accept a DataFrame and return a DataFrame
A tuple containing:
- Concatenated DataFrame with all processed results
- StreamMetrics object with performance statistics
compare_batch_vs_streaming
The DataFrame to process in both modes
Function to apply in both batch and streaming modes
Chunk size for streaming mode
Dictionary containing performance comparison metrics:
batch_time_s: Total time for batch processing in secondsstream_time_s: Total time for streaming processing in secondsstream_latency_ms_per_row: Average latency per row in millisecondsstream_throughput_rows_per_s: Rows processed per second in streaming mode