What is Pulsar IO?
Pulsar IO is a framework for building connectors that move data between Apache Pulsar and external systems. It provides a simple interface for integrating Pulsar with databases, messaging systems, and other data sources and sinks.Connector Types
Pulsar IO supports two types of connectors:Source Connectors
Source connectors read data from external systems and write it to Pulsar topics. They implement theSource interface and continuously pull or receive data from external systems.
Key characteristics:
- Implement
org.apache.pulsar.io.core.Source<T>interface - Read data from external systems
- Push messages into Pulsar topics
- Support configurable polling and batching
Sink Connectors
Sink connectors read data from Pulsar topics and write it to external systems. They implement theSink interface and process messages as they arrive.
Key characteristics:
- Implement
org.apache.pulsar.io.core.Sink<T>interface - Read messages from Pulsar topics
- Write data to external systems
- Support delivery guarantees and error handling
Core Concepts
Connector Lifecycle
- Open: Initialize the connector with configuration
- Process: Read or write data (depending on connector type)
- Close: Clean up resources when shutting down
Configuration
Connectors are configured using YAML files or configuration maps. Each connector has a specific configuration class that defines required and optional parameters.Runtime Modes
Pulsar IO connectors can run in different modes:- Standalone: Run as part of the Pulsar Functions runtime
- Cluster: Deploy across multiple Pulsar brokers for high availability
- Kubernetes: Run as containerized workloads
Architecture
Source Connector Flow
Sink Connector Flow
Key Features
Built-in Connectors
Pulsar includes a rich set of built-in connectors for popular systems. See the Connectors page for a complete list.Processing Guarantees
- At-most-once: Messages may be lost but never redelivered
- At-least-once: Messages are never lost but may be redelivered
- Effectively-once: Each message is processed exactly once
Schema Support
Connectors support Pulsar’s schema registry, allowing automatic schema evolution and type safety.Error Handling
Built-in support for:- Dead letter queues for failed messages
- Retry policies with exponential backoff
- Custom error handlers
Monitoring
Connectors expose metrics for:- Message throughput
- Processing latency
- Error rates
- Connection health
Managing Connectors
Using pulsar-admin CLI
When to Use Pulsar IO
Pulsar IO is ideal when you need to:- Ingest data from external systems into Pulsar
- Export Pulsar data to external systems
- Bridge between Pulsar and legacy systems
- Implement change data capture (CDC) pipelines
- Build data integration workflows
Next Steps
- Explore available connectors
- Learn about developing custom connectors
- Review connector-specific configuration guides