Function Runtime Configuration

Pulsar Functions support multiple runtime modes with different isolation, performance, and resource characteristics. This guide covers runtime configuration and optimization.

Runtime Modes

Pulsar Functions can run in three different modes:

Thread Mode

Functions run as threads within the Function Worker process. Characteristics:

Lowest overhead and fastest startup
Shared JVM with Function Worker
No isolation between functions
Best for lightweight, trusted functions

Use Cases:

Development and testing
Simple transformations
Low-latency requirements

bin/pulsar-admin functions create \
    --name thread-function \
    --jar my-function.jar \
    --classname com.example.MyFunction \
    --inputs persistent://public/default/input \
    --runtime THREAD

Process Mode

Functions run as separate processes on the same host. Characteristics:

Process-level isolation
Better resource control
Independent failure domains
Moderate overhead

Use Cases:

Production deployments
Functions requiring isolation
Multi-language support

bin/pulsar-admin functions create \
    --name process-function \
    --jar my-function.jar \
    --classname com.example.MyFunction \
    --inputs persistent://public/default/input \
    --runtime PROCESS

Kubernetes Mode

Functions run as Kubernetes pods. Characteristics:

Full container isolation
Kubernetes resource management
Auto-scaling support
Cloud-native deployment

Use Cases:

Large-scale production deployments
Multi-tenant environments
Functions requiring specific container images
Integration with Kubernetes ecosystem

bin/pulsar-admin functions create \
    --name k8s-function \
    --jar my-function.jar \
    --classname com.example.MyFunction \
    --inputs persistent://public/default/input \
    --runtime KUBERNETES

Resource Configuration

CPU Allocation

Specify CPU resources in cores:

bin/pulsar-admin functions create \
    --name my-function \
    --jar my-function.jar \
    --classname com.example.MyFunction \
    --inputs persistent://public/default/input \
    --cpu 0.5  # Request 0.5 CPU cores

For Kubernetes runtime:

Translates to Kubernetes CPU requests
Can specify limits separately in function worker config

Memory Allocation

Specify RAM in bytes:

bin/pulsar-admin functions create \
    --name my-function \
    --jar my-function.jar \
    --classname com.example.MyFunction \
    --inputs persistent://public/default/input \
    --ram 536870912  # 512 MB in bytes

Memory considerations:

Include overhead for runtime and libraries
Monitor actual usage and adjust
Set appropriate JVM heap size for Java functions

Disk Allocation

Specify disk space in bytes:

bin/pulsar-admin functions create \
    --name my-function \
    --jar my-function.jar \
    --classname com.example.MyFunction \
    --inputs persistent://public/default/input \
    --disk 1073741824  # 1 GB in bytes

Disk is used for:

Function package and dependencies
Local state storage (if enabled)
Temporary files

Complete Resource Configuration

function-config.yaml

tenant: "public"
namespace: "default"
name: "resource-optimized-function"
className: "com.example.OptimizedFunction"
inputs: ["persistent://public/default/input"]
output: "persistent://public/default/output"

# Resource allocation
resources:
  cpu: 1.0
  ram: 1073741824  # 1 GB
  disk: 2147483648  # 2 GB

# Runtime configuration
runtime: KUBERNETES
parallelism: 3

Parallelism and Scaling

Instance Parallelism

Run multiple instances for horizontal scaling:

bin/pulsar-admin functions create \
    --name parallel-function \
    --jar my-function.jar \
    --classname com.example.ParallelFunction \
    --inputs persistent://public/default/input \
    --parallelism 5

How It Works:

Each instance processes a subset of input partitions
Load balanced automatically by Pulsar
Instances can run on different workers

Scaling Guidelines:

Start with parallelism = number of input topic partitions
Increase if instances are CPU-bound
Monitor per-instance throughput

Dynamic Scaling

Update parallelism without redeployment:

bin/pulsar-admin functions update \
    --tenant public \
    --namespace default \
    --name my-function \
    --parallelism 10

Subscription Configuration

Subscription Type

Control how messages are distributed across function instances:

bin/pulsar-admin functions create \
    --name my-function \
    --jar my-function.jar \
    --classname com.example.MyFunction \
    --inputs persistent://public/default/input \
    --subscription-type SHARED  # SHARED, FAILOVER, or KEY_SHARED

Subscription Types:

SHARED (default): Messages distributed round-robin across instances
FAILOVER: Only one instance receives messages; others are standby
KEY_SHARED: Messages with same key go to same instance

Subscription Position

Set where to start consuming:

bin/pulsar-admin functions create \
    --name my-function \
    --jar my-function.jar \
    --classname com.example.MyFunction \
    --inputs persistent://public/default/input \
    --subscription-position EARLIEST  # EARLIEST or LATEST

Processing Guarantees

At-Most-Once

Fastest processing, no retries:

bin/pulsar-admin functions create \
    --name fast-function \
    --jar my-function.jar \
    --classname com.example.FastFunction \
    --inputs persistent://public/default/input \
    --processing-guarantees ATMOST_ONCE \
    --auto-ack true

Characteristics:

Message acknowledged immediately before processing
No redelivery on failure
Lowest latency

At-Least-Once

Default mode with automatic retries:

bin/pulsar-admin functions create \
    --name reliable-function \
    --jar my-function.jar \
    --classname com.example.ReliableFunction \
    --inputs persistent://public/default/input \
    --processing-guarantees ATLEAST_ONCE \
    --max-message-retries 3 \
    --dead-letter-topic persistent://public/default/dlq

Characteristics:

Message acknowledged after successful processing
Automatic retries on failure
Requires idempotent function logic

Effectively-Once

Exactly-once semantics using state:

bin/pulsar-admin functions create \
    --name exactly-once-function \
    --jar my-function.jar \
    --classname com.example.ExactlyOnceFunction \
    --inputs persistent://public/default/input \
    --processing-guarantees EFFECTIVELY_ONCE

Characteristics:

Message processing and state updates are atomic
Highest consistency guarantees
Higher latency due to coordination

Requirements:

Function must use Pulsar state storage
State updates committed with message acknowledgment

Timeout Configuration

Message Timeout

Set maximum processing time per message:

bin/pulsar-admin functions create \
    --name timeout-function \
    --jar my-function.jar \
    --classname com.example.TimeoutFunction \
    --inputs persistent://public/default/input \
    --timeout-ms 30000  # 30 seconds

If processing exceeds timeout:

Message is considered failed
Retried based on retry configuration

Retry Configuration

Configure retry behavior:

function-config.yaml

name: "retry-function"
className: "com.example.RetryFunction"
inputs: ["persistent://public/default/input"]

# Retry configuration
maxMessageRetries: 5
deadLetterTopic: "persistent://public/default/function-dlq"
timeoutMs: 30000

State Storage Configuration

Enable State Storage

Configure state backend:

stateful-config.yaml

name: "stateful-function"
className: "com.example.StatefulFunction"
inputs: ["persistent://public/default/input"]

# State storage configuration
statefulConfig:
  pulsar:
    # BookKeeper state storage
    stateStorageServiceUrl: "bk://localhost:4181"

State Storage Options

BookKeeper (default):

Integrated with Pulsar cluster
High performance, low latency
Automatic replication

External State Store:

Connect to external databases
Use for shared state across functions
Custom implementation via StateStore interface

Custom Runtime Options

JVM Options (Java)

Configure JVM for Java functions:

java-function-config.yaml

name: "java-function"
className: "com.example.JavaFunction"
inputs: ["persistent://public/default/input"]

# Custom JVM options
customRuntimeOptions: |
  -XX:+UseG1GC
  -XX:MaxGCPauseMillis=100
  -Xms512m
  -Xmx1024m

Environment Variables

Set environment variables for function runtime:

bin/pulsar-admin functions create \
    --name env-function \
    --jar my-function.jar \
    --classname com.example.EnvFunction \
    --inputs persistent://public/default/input \
    --env '{"LOG_LEVEL":"DEBUG","REGION":"us-west-2"}'

Kubernetes-Specific Configuration

Custom Container Image

Use custom Docker image:

bin/pulsar-admin functions create \
    --name k8s-custom-function \
    --jar my-function.jar \
    --classname com.example.CustomFunction \
    --inputs persistent://public/default/input \
    --runtime KUBERNETES \
    --custom-runtime-options '{"image":"myregistry/custom-pulsar-function:1.0"}'

Service Account

Specify Kubernetes service account:

k8s-function-config.yaml

name: "k8s-function"
runtime: KUBERNETES
className: "com.example.K8sFunction"
inputs: ["persistent://public/default/input"]

customRuntimeOptions: |
  {
    "serviceAccount": "pulsar-function-sa",
    "labels": {
      "app": "pulsar-function",
      "environment": "production"
    },
    "annotations": {
      "prometheus.io/scrape": "true"
    }
  }

Monitoring and Debugging

Enable Debug Logging

bin/pulsar-admin functions create \
    --name debug-function \
    --jar my-function.jar \
    --classname com.example.DebugFunction \
    --inputs persistent://public/default/input \
    --log-topic persistent://public/default/function-logs

View debug logs:

bin/pulsar-client consume \
    persistent://public/default/function-logs \
    --subscription-name log-viewer \
    --num-messages 0

Profiling Functions

For Java functions, enable profiling:

profiling-config.yaml

name: "profiled-function"
className: "com.example.ProfiledFunction"
inputs: ["persistent://public/default/input"]

customRuntimeOptions: |
  -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
  -XX:+FlightRecorder
  -XX:StartFlightRecording=duration=60s,filename=/tmp/recording.jfr

Best Practices

Choose the Right Runtime

Use Thread mode for development, Process mode for production, and Kubernetes mode for large-scale or multi-tenant deployments.

Right-Size Resources

Monitor actual resource usage and adjust CPU/RAM allocations. Over-provisioning wastes resources; under-provisioning causes failures.

Match Parallelism to Load

Start with parallelism equal to the number of input partitions. Increase based on per-instance CPU utilization.

Configure Appropriate Guarantees

Use EFFECTIVELY_ONCE only when necessary. ATLEAST_ONCE with idempotent logic is often sufficient and more performant.

Set Reasonable Timeouts

Set timeouts based on expected processing time plus buffer. Monitor timeout-related failures.

Use Dead Letter Queues

Always configure DLQ for functions with retry limits to prevent message loss.

Performance Tuning

Optimize Throughput

name: "high-throughput-function"
parallelism: 10
processingGuarantees: ATLEAST_ONCE
resources:
  cpu: 2.0
  ram: 2147483648
autoAck: true
subscriptionType: SHARED

Troubleshooting

OOM Errors

Symptoms: Function instances crash frequently Solutions:

Increase --ram allocation
Optimize function memory usage
Check for memory leaks

High Latency

Symptoms: Messages processed slowly Solutions:

Increase --parallelism
Use THREAD or PROCESS runtime instead of KUBERNETES
Optimize function code
Use async operations

Function Won’t Scale

Symptoms: Adding instances doesn’t increase throughput Solutions:

Increase input topic partitions
Check for bottlenecks in function code
Verify resource availability on workers

Next Steps

Function Development

Learn advanced development patterns

Deployment Guide

Explore deployment options

Getting Started

Core Concepts

Client Libraries

Pulsar Functions

Pulsar IO

Deployment

Operations

Security

​Runtime Modes

​Thread Mode

​Process Mode

​Kubernetes Mode

​Resource Configuration

​CPU Allocation

​Memory Allocation

​Disk Allocation

​Complete Resource Configuration

​Parallelism and Scaling

​Instance Parallelism

​Dynamic Scaling

​Subscription Configuration

​Subscription Type

​Subscription Position

​Processing Guarantees

​At-Most-Once

​At-Least-Once

​Effectively-Once

​Timeout Configuration

​Message Timeout

​Retry Configuration

​State Storage Configuration

​Enable State Storage

​State Storage Options

​Custom Runtime Options

​JVM Options (Java)

​Environment Variables

​Kubernetes-Specific Configuration

​Custom Container Image

​Service Account

​Monitoring and Debugging

​Enable Debug Logging

​Profiling Functions

​Best Practices

​Performance Tuning

​Optimize Throughput

​Troubleshooting

​OOM Errors

​High Latency

​Function Won’t Scale

​Next Steps

Function Development

Deployment Guide

Build docs developers (and LLMs) love

Runtime Modes

Thread Mode

Process Mode

Kubernetes Mode

Resource Configuration

CPU Allocation

Memory Allocation

Disk Allocation

Complete Resource Configuration

Parallelism and Scaling

Instance Parallelism

Dynamic Scaling

Subscription Configuration

Subscription Type

Subscription Position

Processing Guarantees

At-Most-Once

At-Least-Once

Effectively-Once

Timeout Configuration

Message Timeout

Retry Configuration

State Storage Configuration

Enable State Storage

State Storage Options

Custom Runtime Options

JVM Options (Java)

Environment Variables

Kubernetes-Specific Configuration

Custom Container Image

Service Account

Monitoring and Debugging

Enable Debug Logging

Profiling Functions

Best Practices

Performance Tuning

Optimize Throughput

Troubleshooting

OOM Errors

High Latency

Function Won’t Scale

Next Steps