Snowflake Data Connector

The Snowflake connector enables Spice to query Snowflake data warehouses with query push-down support using Arrow-based data transfer for high performance.

Status

Beta - Functional with ongoing testing and improvements

Supported Features

SQL query push-down (WHERE, JOIN, aggregations)
Arrow-based data transfer
Multiple warehouse support
Role-based access control
Data acceleration
Schema introspection
Connection pooling

Configuration

Basic Configuration

version: v1
kind: Spicepod
name: snowflake-demo

datasets:
  - from: snowflake:DATABASE.SCHEMA.TABLE
    name: my_table
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: COMPUTE_WH
      snowflake_role: accountadmin

With Data Acceleration

datasets:
  - from: snowflake:ANALYTICS.PUBLIC.SALES
    name: sales
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: ANALYTICS_WH
      snowflake_role: analyst
    acceleration:
      enabled: true
      engine: arrow
      refresh_interval: 5m

Multiple Tables with Shared Config

datasets:
  - from: snowflake:BENCHMARKS.TPCH_SF1.CUSTOMER
    name: customer
    params: &snowflake_params
      snowflake_warehouse: COMPUTE_WH
      snowflake_role: accountadmin
      snowflake_account: ${secrets:snowflake_account}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_username: ${secrets:snowflake_username}

  - from: snowflake:BENCHMARKS.TPCH_SF1.ORDERS
    name: orders
    params: *snowflake_params

  - from: snowflake:BENCHMARKS.TPCH_SF1.LINEITEM
    name: lineitem
    params: *snowflake_params

With Custom Database and Schema

datasets:
  - from: snowflake:MY_DB.PUBLIC.EVENTS
    name: events
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: EVENT_WH
      snowflake_database: MY_DB
      snowflake_schema: PUBLIC

Parameters

snowflake_account

string

required

Snowflake account identifier (e.g., xy12345.us-east-1)

snowflake_username

string

required

Snowflake username

snowflake_password

string

required

Snowflake password

snowflake_warehouse

string

required

Snowflake virtual warehouse name

snowflake_role

string

default:"PUBLIC"

Snowflake role to use for queries

snowflake_database

string

Default database (can be overridden in table reference)

snowflake_schema

string

Default schema (can be overridden in table reference)

client_timeout

duration

default:"30s"

Timeout for Snowflake operations (e.g., 60s, 5m)

Authentication

Username and Password

Standard authentication with username and password:

params:
  snowflake_account: xy12345.us-east-1
  snowflake_username: ${secrets:snowflake_username}
  snowflake_password: ${secrets:snowflake_password}

Environment Variables

params:
  snowflake_account: ${env:SNOWFLAKE_ACCOUNT}
  snowflake_username: ${env:SNOWFLAKE_USERNAME}
  snowflake_password: ${env:SNOWFLAKE_PASSWORD}
  snowflake_warehouse: ${env:SNOWFLAKE_WAREHOUSE}

Use Cases

Cloud Data Warehouse Analytics

datasets:
  - from: snowflake:WAREHOUSE.PROD.SALES_DATA
    name: sales
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: ANALYTICS_WH
      snowflake_role: analyst
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      refresh_interval: 1h

Query accelerated data:

SELECT 
  region,
  product_category,
  SUM(revenue) as total_revenue,
  COUNT(DISTINCT customer_id) as unique_customers
FROM sales
WHERE order_date >= CURRENT_DATE - INTERVAL '30' DAY
GROUP BY region, product_category
ORDER BY total_revenue DESC;

Multi-Warehouse Query

datasets:
  - from: snowflake:PROD.PUBLIC.CUSTOMERS
    name: customers
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: CUSTOMER_WH
      snowflake_role: reader

  - from: snowflake:PROD.PUBLIC.TRANSACTIONS
    name: transactions
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: TRANSACTION_WH  # Different warehouse
      snowflake_role: reader

Federated Query with Operational Database

datasets:
  - from: snowflake:EDW.PUBLIC.HISTORICAL_ORDERS
    name: historical_orders
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: EDW_WH

  - from: postgres:public.recent_orders
    name: recent_orders
    params:
      pg_host: operational-db.internal
      pg_db: orders
      pg_user: readonly
      pg_pass: ${secrets:pg_password}

Query across both:

SELECT 
  'historical' as source,
  DATE_TRUNC('month', order_date) as month,
  COUNT(*) as order_count
FROM historical_orders
WHERE order_date < CURRENT_DATE - INTERVAL '90' DAY
GROUP BY month

UNION ALL

SELECT 
  'recent' as source,
  DATE_TRUNC('month', order_date) as month,
  COUNT(*) as order_count
FROM recent_orders
WHERE order_date >= CURRENT_DATE - INTERVAL '90' DAY
GROUP BY month

ORDER BY month;

Cost Optimization with Acceleration

datasets:
  - from: snowflake:COST_CENTER.PUBLIC.LARGE_DATASET
    name: large_dataset
    params:
      snowflake_account: ${secrets:snowflake_account}
      snowflake_username: ${secrets:snowflake_username}
      snowflake_password: ${secrets:snowflake_password}
      snowflake_warehouse: XSMALL_WH  # Use smallest warehouse
      snowflake_role: cost_optimizer
    acceleration:
      enabled: true
      engine: arrow
      refresh_interval: 24h  # Infrequent refresh to minimize warehouse usage

Performance Tips

Use Acceleration: For frequently queried data, acceleration dramatically reduces Snowflake compute costs
Right-Size Warehouses: Match warehouse size to query needs
Query Push-down: Spice pushes WHERE, JOIN, and aggregations to Snowflake
Refresh Intervals: Balance data freshness with cost by adjusting refresh intervals
Partition Tables: Snowflake clustering keys improve query performance
Arrow Format: Connector uses Arrow for efficient data transfer

Warehouse Management

Auto-Suspend and Auto-Resume

Snowflake warehouses auto-suspend when idle. Spice queries will auto-resume warehouses:

params:
  snowflake_warehouse: AUTO_SUSPEND_WH
  client_timeout: 120s  # Allow time for warehouse startup

Dedicated Warehouse for Spice

Create a dedicated warehouse for Spice queries:

-- In Snowflake
CREATE WAREHOUSE SPICE_WH WITH
  WAREHOUSE_SIZE = 'XSMALL'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE;

GRANT USAGE ON WAREHOUSE SPICE_WH TO ROLE spice_role;

params:
  snowflake_warehouse: SPICE_WH
  snowflake_role: spice_role

Data Type Mapping

Snowflake types are automatically mapped to Arrow types:

NUMBER, INT → Int64
FLOAT, DOUBLE → Float64
VARCHAR, TEXT → Utf8
BOOLEAN → Boolean
DATE → Date32
TIMESTAMP → Timestamp
VARIANT → Utf8 (JSON string)

Limitations

Write operations not supported (read-only)
Semi-structured data (VARIANT, ARRAY, OBJECT) returned as JSON strings
Query result size limited by available memory
Warehouse must be running or auto-resume enabled

Get Started

Core Concepts

Data Connectors

Data Accelerators

Search

AI & ML

Deployment

Snowflake Data Connector

Status

Supported Features

Configuration

Basic Configuration

With Data Acceleration

Multiple Tables with Shared Config

With Custom Database and Schema

Parameters

Authentication

Username and Password

Environment Variables

Use Cases

Cloud Data Warehouse Analytics

Multi-Warehouse Query

Federated Query with Operational Database

Cost Optimization with Acceleration

Performance Tips

Warehouse Management

Auto-Suspend and Auto-Resume

Dedicated Warehouse for Spice

Data Type Mapping

Limitations

Build docs developers (and LLMs) love

Get Started

Core Concepts

Data Connectors

Data Accelerators

Search

AI & ML

Deployment

​Status

​Supported Features

​Configuration

​Basic Configuration

​With Data Acceleration

​Multiple Tables with Shared Config

​With Custom Database and Schema

​Parameters

​Authentication

​Username and Password

​Environment Variables

​Use Cases

​Cloud Data Warehouse Analytics

​Multi-Warehouse Query

​Federated Query with Operational Database

​Cost Optimization with Acceleration

​Performance Tips

​Warehouse Management

​Auto-Suspend and Auto-Resume

​Dedicated Warehouse for Spice

​Data Type Mapping

​Limitations

​Related Resources

Build docs developers (and LLMs) love

Status

Supported Features

Configuration

Basic Configuration

With Data Acceleration

Multiple Tables with Shared Config

With Custom Database and Schema

Parameters

Authentication

Username and Password

Environment Variables

Use Cases

Cloud Data Warehouse Analytics

Multi-Warehouse Query

Federated Query with Operational Database

Cost Optimization with Acceleration

Performance Tips

Warehouse Management

Auto-Suspend and Auto-Resume

Dedicated Warehouse for Spice

Data Type Mapping

Limitations

Related Resources