PyTorch Image Classification
Run inference on images using PyTorch models. Based on sdks/python/apache_beam/examples/inference/pytorch_image_classification.py:96-166- Basic Example
- With Pre/Post Processing
- Automatic batching for efficient inference
- Support for CPU and GPU devices
- Pre/post processing hooks
- Keyed inputs for tracking
Scikit-learn Classification
Run inference with scikit-learn models. Based on sdks/python/apache_beam/examples/inference/sklearn_mnist_classification.py:88-133ModelFileType.PICKLE- Pickled modelsModelFileType.JOBLIB- Joblib-serialized models
- Set
large_model=Truefor models that cause memory pressure - Reduces concurrent copies in workers
TensorFlow Inference
Run TensorFlow SavedModel inference.Multi-Model Inference
Run inference with different models based on keys.Inference with Side Inputs
Use side inputs for dynamic model configuration.Batching Configuration
Optimize inference performance with batching.- min_batch_size: Wait for at least this many elements
- max_batch_size: Process at most this many elements together
- max_batch_duration_secs: Maximum time to wait for batch to fill
Model Loading Strategies
Different ways to load models.- From File Path
- From State Dict (PyTorch)
- SavedModel (TensorFlow)
- Custom Loader
Streaming ML Pipeline
Combine streaming with ML inference.Feature Engineering
Preprocess features before inference.Model Monitoring
Track inference metrics and performance.A/B Testing Models
Compare multiple model versions.Best Practices
Optimize Batching
- Configure batch sizes for your hardware
- Balance latency vs. throughput
- Monitor batch utilization
Handle Large Models
- Use
large_model=Truefor memory efficiency - Consider model quantization
- Use GPU workers for large models
Version Your Models
- Include version info in model paths
- Track model metadata
- Support A/B testing
Monitor Performance
- Track inference latency
- Monitor prediction quality
- Set up alerting for anomalies
Supported Frameworks
PyTorch
PytorchModelHandlerTensor- State dict or entire model
- CPU and GPU support
Scikit-learn
SklearnModelHandlerNumpy- Pickle or Joblib format
- All sklearn estimators
TensorFlow
TFModelHandlerNumpy- SavedModel format
- TF 2.x models
ONNX
ONNXModelHandler- Cross-framework models
- Optimized runtime
XGBoost
XGBoostModelHandler- Boosted trees
- Native format
Vertex AI
VertexAIModelHandler- Managed endpoints
- Auto-scaling
Related Resources
RunInference API
Complete API reference
Model Handlers
Framework-specific handlers
ML Transforms
Feature engineering transforms
ML Pipeline Patterns
Common ML pipeline architectures