Skip to main content
The Snowflake connector enables Spice to query Snowflake data warehouses with query push-down support using Arrow-based data transfer for high performance.

Status

Beta - Functional with ongoing testing and improvements

Supported Features

  • SQL query push-down (WHERE, JOIN, aggregations)
  • Arrow-based data transfer
  • Multiple warehouse support
  • Role-based access control
  • Data acceleration
  • Schema introspection
  • Connection pooling

Configuration

Basic Configuration

version: v1
kind: Spicepod
name: snowflake-demo

datasets:
  - from: snowflake:DATABASE.SCHEMA.TABLE
    name: my_table
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: COMPUTE_WH
      snowflake_role: accountadmin

With Data Acceleration

datasets:
  - from: snowflake:ANALYTICS.PUBLIC.SALES
    name: sales
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: ANALYTICS_WH
      snowflake_role: analyst
    acceleration:
      enabled: true
      engine: arrow
      refresh_interval: 5m

Multiple Tables with Shared Config

datasets:
  - from: snowflake:BENCHMARKS.TPCH_SF1.CUSTOMER
    name: customer
    params: &snowflake_params
      snowflake_warehouse: COMPUTE_WH
      snowflake_role: accountadmin
      snowflake_account: ${secrets:snowflake_account}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_username: ${secrets:snowflake_username}

  - from: snowflake:BENCHMARKS.TPCH_SF1.ORDERS
    name: orders
    params: *snowflake_params

  - from: snowflake:BENCHMARKS.TPCH_SF1.LINEITEM
    name: lineitem
    params: *snowflake_params

With Custom Database and Schema

datasets:
  - from: snowflake:MY_DB.PUBLIC.EVENTS
    name: events
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: EVENT_WH
      snowflake_database: MY_DB
      snowflake_schema: PUBLIC

Parameters

snowflake_account
string
required
Snowflake account identifier (e.g., xy12345.us-east-1)
snowflake_username
string
required
Snowflake username
snowflake_password
string
required
Snowflake password
snowflake_warehouse
string
required
Snowflake virtual warehouse name
snowflake_role
string
default:"PUBLIC"
Snowflake role to use for queries
snowflake_database
string
Default database (can be overridden in table reference)
snowflake_schema
string
Default schema (can be overridden in table reference)
client_timeout
duration
default:"30s"
Timeout for Snowflake operations (e.g., 60s, 5m)

Authentication

Username and Password

Standard authentication with username and password:
params:
  snowflake_account: xy12345.us-east-1
  snowflake_username: ${secrets:snowflake_username}
  snowflake_password: ${secrets:snowflake_password}

Environment Variables

params:
  snowflake_account: ${env:SNOWFLAKE_ACCOUNT}
  snowflake_username: ${env:SNOWFLAKE_USERNAME}
  snowflake_password: ${env:SNOWFLAKE_PASSWORD}
  snowflake_warehouse: ${env:SNOWFLAKE_WAREHOUSE}

Use Cases

Cloud Data Warehouse Analytics

datasets:
  - from: snowflake:WAREHOUSE.PROD.SALES_DATA
    name: sales
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: ANALYTICS_WH
      snowflake_role: analyst
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      refresh_interval: 1h
Query accelerated data:
SELECT 
  region,
  product_category,
  SUM(revenue) as total_revenue,
  COUNT(DISTINCT customer_id) as unique_customers
FROM sales
WHERE order_date >= CURRENT_DATE - INTERVAL '30' DAY
GROUP BY region, product_category
ORDER BY total_revenue DESC;

Multi-Warehouse Query

datasets:
  - from: snowflake:PROD.PUBLIC.CUSTOMERS
    name: customers
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: CUSTOMER_WH
      snowflake_role: reader

  - from: snowflake:PROD.PUBLIC.TRANSACTIONS
    name: transactions
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: TRANSACTION_WH  # Different warehouse
      snowflake_role: reader

Federated Query with Operational Database

datasets:
  - from: snowflake:EDW.PUBLIC.HISTORICAL_ORDERS
    name: historical_orders
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: EDW_WH

  - from: postgres:public.recent_orders
    name: recent_orders
    params:
      pg_host: operational-db.internal
      pg_db: orders
      pg_user: readonly
      pg_pass: ${secrets:pg_password}
Query across both:
SELECT 
  'historical' as source,
  DATE_TRUNC('month', order_date) as month,
  COUNT(*) as order_count
FROM historical_orders
WHERE order_date < CURRENT_DATE - INTERVAL '90' DAY
GROUP BY month

UNION ALL

SELECT 
  'recent' as source,
  DATE_TRUNC('month', order_date) as month,
  COUNT(*) as order_count
FROM recent_orders
WHERE order_date >= CURRENT_DATE - INTERVAL '90' DAY
GROUP BY month

ORDER BY month;

Cost Optimization with Acceleration

datasets:
  - from: snowflake:COST_CENTER.PUBLIC.LARGE_DATASET
    name: large_dataset
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: XSMALL_WH  # Use smallest warehouse
      snowflake_role: cost_optimizer
    acceleration:
      enabled: true
      engine: arrow
      refresh_interval: 24h  # Infrequent refresh to minimize warehouse usage

Performance Tips

  1. Use Acceleration: For frequently queried data, acceleration dramatically reduces Snowflake compute costs
  2. Right-Size Warehouses: Match warehouse size to query needs
  3. Query Push-down: Spice pushes WHERE, JOIN, and aggregations to Snowflake
  4. Refresh Intervals: Balance data freshness with cost by adjusting refresh intervals
  5. Partition Tables: Snowflake clustering keys improve query performance
  6. Arrow Format: Connector uses Arrow for efficient data transfer

Warehouse Management

Auto-Suspend and Auto-Resume

Snowflake warehouses auto-suspend when idle. Spice queries will auto-resume warehouses:
params:
  snowflake_warehouse: AUTO_SUSPEND_WH
  client_timeout: 120s  # Allow time for warehouse startup

Dedicated Warehouse for Spice

Create a dedicated warehouse for Spice queries:
-- In Snowflake
CREATE WAREHOUSE SPICE_WH WITH
  WAREHOUSE_SIZE = 'XSMALL'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE;

GRANT USAGE ON WAREHOUSE SPICE_WH TO ROLE spice_role;
params:
  snowflake_warehouse: SPICE_WH
  snowflake_role: spice_role

Data Type Mapping

Snowflake types are automatically mapped to Arrow types:
  • NUMBER, INTInt64
  • FLOAT, DOUBLEFloat64
  • VARCHAR, TEXTUtf8
  • BOOLEANBoolean
  • DATEDate32
  • TIMESTAMPTimestamp
  • VARIANTUtf8 (JSON string)

Limitations

  • Write operations not supported (read-only)
  • Semi-structured data (VARIANT, ARRAY, OBJECT) returned as JSON strings
  • Query result size limited by available memory
  • Warehouse must be running or auto-resume enabled

Build docs developers (and LLMs) love