Skip to main content
Spice provides a comprehensive set of data connectors that enable federated SQL queries across databases, data warehouses, data lakes, and file systems. Each connector supports reading data from external sources, with many also supporting data acceleration and materialization.

How Data Connectors Work

Data connectors in Spice allow you to:
  • Federate queries across multiple data sources using standard SQL
  • Accelerate data by materializing datasets locally with Arrow, DuckDB, SQLite, or PostgreSQL
  • Push down operations like filters and projections to source systems for optimal performance
  • Connect securely using environment variables, secrets, or configuration parameters

Supported Connectors

Spice supports 30+ data connectors with varying levels of maturity:
NameDescriptionStatusProtocol/Format
databricks (mode: delta_lake)DatabricksStableS3/Delta Lake
delta_lakeDelta LakeStableDelta Lake
dremioDremioStableArrow Flight
duckdbDuckDBStableEmbedded
fileFileStableParquet, CSV
githubGitHubStableGitHub API
postgresPostgreSQLStable
s3S3StableParquet, CSV
mysqlMySQLStable
spice.aiSpice.aiStableArrow Flight
graphqlGraphQLRelease CandidateJSON
dynamodbAmazon DynamoDBRelease Candidate
databricks (mode: spark_connect)DatabricksBetaSpark Connect
flightsqlFlightSQLBetaArrow Flight SQL
icebergApache IcebergBetaParquet
mssqlMicrosoft SQL ServerBetaTabular Data Stream (TDS)
odbcODBCBetaODBC
snowflakeSnowflakeBetaArrow
sparkSparkBetaSpark Connect
oracleOracleAlphaOracle ODPI-C
abfsAzure BlobFSAlphaParquet, CSV
clickhouseClickHouseAlpha
debeziumDebezium CDCAlphaKafka + JSON
gcs, gsGoogle Cloud StorageAlphaParquet, CSV, JSON
kafkaKafkaAlphaKafka + JSON
ftp, sftpFTP/SFTPAlphaParquet, CSV
glueAWS GlueAlphaIceberg, Parquet, CSV
http, httpsHTTP(s)AlphaParquet, CSV, JSON
imapIMAPAlphaIMAP Emails
localpodLocal dataset replicationAlpha
mongodbMongoDBAlpha
sharepointMicrosoft SharePointAlphaUnstructured UTF-8 documents
scylladbScyllaDBAlpha
smbSMB (Server Message Block)AlphaSMB
elasticsearchElasticSearchRoadmap

Status Definitions

  • Stable: Production-ready with comprehensive testing and documentation
  • Release Candidate: Feature-complete with ongoing testing
  • Beta: Functional with some limitations; feedback welcome
  • Alpha: Early access; expect changes and potential issues
  • Roadmap: Planned for future development

Configuration Basics

All connectors are configured in the spicepod.yaml file:
version: v1
kind: Spicepod
name: my-app

datasets:
  - from: postgres:public.users
    name: users
    params:
      pg_host: localhost
      pg_port: 5432
      pg_user: admin
      pg_pass: ${secrets:pg_password}
      pg_db: mydb

Common Features

Data Acceleration

Most connectors support local data acceleration for faster queries:
datasets:
  - from: postgres:public.orders
    name: orders
    params:
      pg_host: localhost
      pg_db: mydb
    acceleration:
      enabled: true
      engine: arrow  # arrow, duckdb, sqlite, postgres, cayenne
      refresh_interval: 10s

Query Push-down

Spice automatically pushes down filters, projections, and aggregations to source systems when supported:
-- Spice pushes the WHERE clause to PostgreSQL
SELECT * FROM users WHERE age > 25;

Secret Management

Use environment variables or the secrets store for sensitive credentials:
params:
  pg_pass: ${secrets:pg_password}  # From secret store
  pg_host: ${env:DB_HOST}          # From environment

Performance Considerations

  1. Federated Queries: When querying data directly from sources, performance depends on the source system and network latency
  2. Acceleration: Enable acceleration for frequently accessed datasets to achieve sub-second query times
  3. Push-down: Spice optimizes queries by pushing operations to the source when possible
  4. Partitioning: Some connectors support partitioning for parallel data loading

Next Steps

Explore detailed documentation for specific connectors: For more examples, see the Spice Cookbook.

Build docs developers (and LLMs) love