Skip to main content

Configuration File Overview

The Delta Sharing Reference Server uses a YAML configuration file to define:
  • Shares, schemas, and tables to expose to recipients
  • Server settings like host, port, and endpoints
  • Performance tuning parameters
  • Security settings for authentication

Creating Your Configuration File

1

Copy the Template

Navigate to your server installation directory and copy the template:
cd /opt/delta-sharing-server
cp conf/delta-sharing-server.yaml.template conf/delta-sharing-server.yaml
2

Edit the Configuration

Open the file in your preferred text editor:
nano conf/delta-sharing-server.yaml
Or use vim, emacs, or any editor of your choice.
3

Customize Settings

Modify the configuration according to your requirements (see sections below)

Configuration File Structure

The YAML configuration file follows this structure:
# Config version
version: 1

# Define shares, schemas, and tables
shares:
  - name: "share_name"
    schemas:
      - name: "schema_name"
        tables:
          - name: "table_name"
            location: "storage_location"
            id: "unique_table_id"

# Server settings
host: "localhost"
port: 8080
endpoint: "/delta-sharing"

# Optional: Authorization
authorization:
  bearerToken: "your_secret_token"

Defining Shares, Schemas, and Tables

Delta Sharing uses a three-level hierarchy: Share > Schema > Table
  • Share: A top-level container representing a collection of data you’re sharing with a recipient or group
  • Schema: A logical grouping of related tables within a share (similar to a database schema)
  • Table: An individual Delta Lake or Parquet table

Basic Example

Here’s a simple configuration with one share containing one table:
version: 1

shares:
  - name: "sales_data"
    schemas:
      - name: "production"
        tables:
          - name: "orders"
            location: "s3a://my-bucket/delta-tables/orders"
            id: "ed5fa005-12ab-4c3f-92e0-c51f4c930c04"

Complete Example with Multiple Shares

version: 1

shares:
  # Share 1: Sales data for external partners
  - name: "partner_sales"
    schemas:
      - name: "transactions"
        tables:
          - name: "orders"
            location: "s3a://company-data/sales/orders"
            id: "00000000-0000-0000-0000-000000000001"
          - name: "line_items"
            location: "s3a://company-data/sales/line_items"
            id: "00000000-0000-0000-0000-000000000002"
      - name: "customers"
        tables:
          - name: "accounts"
            location: "s3a://company-data/crm/accounts"
            id: "00000000-0000-0000-0000-000000000003"
  
  # Share 2: Analytics data for data science team
  - name: "analytics"
    schemas:
      - name: "metrics"
        tables:
          - name: "daily_summary"
            location: "s3a://analytics-bucket/summaries/daily"
            historyShared: true
            id: "00000000-0000-0000-0000-000000000004"

Table Configuration Options

Each table in your configuration supports these properties:
name
string
required
The name of the table as it will appear to recipients
location
string
required
The cloud storage path to the Delta Lake or Parquet table. Must use the appropriate URI scheme:
  • S3: s3a://bucket-name/path/to/table
  • Azure Blob: wasbs://[email protected]/path
  • ADLS Gen2: abfss://[email protected]/path
  • GCS: gs://bucket-name/path/to/table
  • Cloudflare R2: s3a://bucket-name/path/to/table
id
string
required
A unique identifier (UUID) for the table. Generate one using:
uuidgen  # Linux/macOS
# or
python3 -c "import uuid; print(uuid.uuid4())"
historyShared
boolean
default:"false"
Enable sharing of table history and Change Data Feed (CDF). When true, recipients can query incremental changes to the table.
The underlying Delta Lake table must have Change Data Feed enabled (delta.enableChangeDataFeed=true)

Example with History Sharing

tables:
  - name: "events_with_history"
    location: "s3a://data-lake/events"
    historyShared: true
    id: "12345678-1234-1234-1234-123456789abc"

Server Settings

Configure how the server listens and responds to requests:
# Server binding
host: "0.0.0.0"  # Listen on all interfaces
port: 8080        # Port to listen on

# API endpoint prefix
endpoint: "/delta-sharing"

# Pre-signed URL timeout (in seconds)
preSignedUrlTimeoutSeconds: 3600  # 1 hour

# Table metadata cache size
deltaTableCacheSize: 10

# Allow stale table versions (useful for static tables)
stalenessAcceptable: false

# Query pagination settings
queryTablePageSizeLimit: 10000
queryTablePageTokenTtlMs: 259200000  # 3 days
refreshTokenTtlMs: 3600000           # 1 hour
host
string
default:"localhost"
The hostname or IP address the server binds to:
  • localhost: Only accessible from the same machine
  • 0.0.0.0: Accessible from any network interface
  • Specific IP: Bind to a specific network interface
port
integer
default:"8080"
The TCP port the server listens on. Ports below 1024 may require root/administrator privileges.
endpoint
string
default:"/delta-sharing"
URL path prefix for all Delta Sharing API endpoints. The full API URL will be http://host:port/endpoint/
preSignedUrlTimeoutSeconds
integer
default:"3600"
How long (in seconds) pre-signed URLs remain valid. Clients must download data files within this time window.
Longer timeouts increase security risk. Shorter timeouts may cause issues for large file downloads.
deltaTableCacheSize
integer
default:"10"
Number of Delta Lake table metadata objects to cache in memory for performance.
stalenessAcceptable
boolean
default:"false"
When true, the server can work with potentially stale table versions. Useful for static tables that never change.

Predicate Pushdown Configuration

Enable filtering to reduce data transfer:
# Enable predicate hints evaluation
evaluatePredicateHints: false

# Enable JSON predicate hints (recommended)
evaluateJsonPredicateHints: true

# Enable V2 JSON predicate hints
evaluateJsonPredicateHintsV2: true
JSON predicate hints allow clients to push down filters (WHERE clauses) to the server, which can significantly reduce the amount of data transferred.

Pagination Settings

# Maximum number of rows per page
queryTablePageSizeLimit: 10000

# Page token time-to-live (milliseconds)
queryTablePageTokenTtlMs: 259200000  # 3 days

# Refresh token time-to-live (milliseconds)
refreshTokenTtlMs: 3600000  # 1 hour
queryTablePageSizeLimit
integer
default:"10000"
Maximum number of rows returned in a single page when querying tables.
queryTablePageTokenTtlMs
integer
default:"259200000"
How long page tokens remain valid (in milliseconds). Default is 3 days.
refreshTokenTtlMs
integer
default:"3600000"
How long refresh tokens remain valid (in milliseconds). Default is 1 hour.

Complete Configuration Example

Here’s a production-ready configuration example:
version: 1

# Define shares and tables
shares:
  - name: "customer_analytics"
    schemas:
      - name: "user_events"
        tables:
          - name: "clickstream"
            location: "s3a://data-lake/events/clickstream"
            historyShared: true
            id: "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
          - name: "conversions"
            location: "s3a://data-lake/events/conversions"
            id: "b2c3d4e5-f6a7-4b8c-9d0e-1f2a3b4c5d6e"
      - name: "user_profiles"
        tables:
          - name: "demographics"
            location: "s3a://data-lake/users/demographics"
            id: "c3d4e5f6-a7b8-4c9d-0e1f-2a3b4c5d6e7f"

# Server configuration
host: "0.0.0.0"
port: 8080
endpoint: "/delta-sharing"

# Security
authorization:
  bearerToken: "dapi1234567890abcdefghijklmnopqrstuvwxyz"

# Performance tuning
preSignedUrlTimeoutSeconds: 3600
deltaTableCacheSize: 20
stalenessAcceptable: false

# Query settings
evaluatePredicateHints: false
evaluateJsonPredicateHints: true
evaluateJsonPredicateHintsV2: true
queryTablePageSizeLimit: 10000
queryTablePageTokenTtlMs: 259200000
refreshTokenTtlMs: 3600000

Validating Your Configuration

Before starting the server, validate your configuration:
1

Check YAML Syntax

Ensure your YAML is valid using a linter:
# Using yamllint (install with: pip install yamllint)
yamllint conf/delta-sharing-server.yaml

# Using Python
python3 -c "import yaml; yaml.safe_load(open('conf/delta-sharing-server.yaml'))"
2

Verify Storage Paths

Ensure table locations are accessible and use the correct URI scheme
3

Test Unique IDs

Confirm all table IDs are unique UUIDs
4

Start the Server

Start the server and check for configuration errors:
./bin/delta-sharing-server -- --config conf/delta-sharing-server.yaml
Common configuration mistakes:
  • Using s3:// instead of s3a:// for S3 paths
  • Duplicate table IDs
  • Invalid YAML indentation
  • Missing required fields (name, location, id)

Environment-Specific Configurations

host: "localhost"
port: 8080
stalenessAcceptable: true  # Allow stale data
deltaTableCacheSize: 5     # Small cache
# No authorization for easier testing

Next Steps

Cloud Storage Authentication

Configure access to S3, Azure, GCS, or R2

Authorization Setup

Set up bearer tokens and security

Start the Server

Learn how to start and run the server

Create Profile Files

Generate profile files for recipients

Build docs developers (and LLMs) love