Overview
Extension types provide:- Custom semantics: Attach application-specific meaning to Arrow data
- Storage transparency: Use existing Arrow types for physical storage
- Serialization support: Automatically serialize/deserialize metadata
- IPC compatibility: Exchange extension types across processes and languages
- Type safety: Strong typing for domain-specific data
Core Concepts
An extension type consists of:- Storage type: The underlying Arrow type used for physical storage
- Extension name: A unique identifier for the extension type
- Metadata: Serialized parameters that define the type instance
- Array wrapper: Optional custom array class for the extension data
Defining an Extension Type
Basic Extension Type
Create a custom extension type by subclassingExtensionType:
Extension Array
Optionally define a custom array class:Parametric Extension Types
Create extension types with parameters:Registering Extension Types
Global Registration
Register extension types globally for automatic deserialization:Unregistering Types
Querying Registered Types
Using Extension Types
Creating Arrays
Working with Schemas
Nested Extension Types
Extension types can wrap complex storage types:IPC and Serialization
Extension types are automatically handled in IPC:Storage Type Considerations
Choosing Storage Types
Best Practices
Unique Extension Names
Backward Compatibility
Error Handling
When to Use Extension Types
Extension types are ideal for:- Domain-specific semantics: Geospatial coordinates, UUIDs, custom units
- Type safety: Prevent mixing incompatible data at compile time
- Validation: Enforce constraints on data values
- Metadata preservation: Maintain type information across IPC boundaries
- Interoperability: Share custom types between applications
- Simple data: No special semantics needed
- Performance critical: Extension type overhead matters
- Wide compatibility: Consumers don’t support extensions