How Data Connectors Work
Data connectors in Spice allow you to:- Federate queries across multiple data sources using standard SQL
- Accelerate data by materializing datasets locally with Arrow, DuckDB, SQLite, or PostgreSQL
- Push down operations like filters and projections to source systems for optimal performance
- Connect securely using environment variables, secrets, or configuration parameters
Supported Connectors
Spice supports 30+ data connectors with varying levels of maturity:| Name | Description | Status | Protocol/Format |
|---|---|---|---|
databricks (mode: delta_lake) | Databricks | Stable | S3/Delta Lake |
delta_lake | Delta Lake | Stable | Delta Lake |
dremio | Dremio | Stable | Arrow Flight |
duckdb | DuckDB | Stable | Embedded |
file | File | Stable | Parquet, CSV |
github | GitHub | Stable | GitHub API |
postgres | PostgreSQL | Stable | |
s3 | S3 | Stable | Parquet, CSV |
mysql | MySQL | Stable | |
spice.ai | Spice.ai | Stable | Arrow Flight |
graphql | GraphQL | Release Candidate | JSON |
dynamodb | Amazon DynamoDB | Release Candidate | |
databricks (mode: spark_connect) | Databricks | Beta | Spark Connect |
flightsql | FlightSQL | Beta | Arrow Flight SQL |
iceberg | Apache Iceberg | Beta | Parquet |
mssql | Microsoft SQL Server | Beta | Tabular Data Stream (TDS) |
odbc | ODBC | Beta | ODBC |
snowflake | Snowflake | Beta | Arrow |
spark | Spark | Beta | Spark Connect |
oracle | Oracle | Alpha | Oracle ODPI-C |
abfs | Azure BlobFS | Alpha | Parquet, CSV |
clickhouse | ClickHouse | Alpha | |
debezium | Debezium CDC | Alpha | Kafka + JSON |
gcs, gs | Google Cloud Storage | Alpha | Parquet, CSV, JSON |
kafka | Kafka | Alpha | Kafka + JSON |
ftp, sftp | FTP/SFTP | Alpha | Parquet, CSV |
glue | AWS Glue | Alpha | Iceberg, Parquet, CSV |
http, https | HTTP(s) | Alpha | Parquet, CSV, JSON |
imap | IMAP | Alpha | IMAP Emails |
localpod | Local dataset replication | Alpha | |
mongodb | MongoDB | Alpha | |
sharepoint | Microsoft SharePoint | Alpha | Unstructured UTF-8 documents |
scylladb | ScyllaDB | Alpha | |
smb | SMB (Server Message Block) | Alpha | SMB |
elasticsearch | ElasticSearch | Roadmap |
Status Definitions
- Stable: Production-ready with comprehensive testing and documentation
- Release Candidate: Feature-complete with ongoing testing
- Beta: Functional with some limitations; feedback welcome
- Alpha: Early access; expect changes and potential issues
- Roadmap: Planned for future development
Configuration Basics
All connectors are configured in thespicepod.yaml file:
Common Features
Data Acceleration
Most connectors support local data acceleration for faster queries:Query Push-down
Spice automatically pushes down filters, projections, and aggregations to source systems when supported:Secret Management
Use environment variables or the secrets store for sensitive credentials:Performance Considerations
- Federated Queries: When querying data directly from sources, performance depends on the source system and network latency
- Acceleration: Enable acceleration for frequently accessed datasets to achieve sub-second query times
- Push-down: Spice optimizes queries by pushing operations to the source when possible
- Partitioning: Some connectors support partitioning for parallel data loading
Next Steps
Explore detailed documentation for specific connectors:- S3 Connector - Query Parquet and CSV files in S3
- PostgreSQL Connector - Connect to PostgreSQL databases
- MySQL Connector - Connect to MySQL databases
- Databricks Connector - Access Databricks Delta Lake and Spark
- Snowflake Connector - Query Snowflake data warehouses
- DuckDB Connector - Query DuckDB databases
- ClickHouse Connector - Connect to ClickHouse
- MongoDB Connector - Query MongoDB collections