Skip to main content

Creating Table References

The first step to using Delta Sharing with SQL is creating a table reference using the deltaSharing format.

Basic Syntax

CREATE TABLE <table-name> 
USING deltaSharing 
LOCATION '<profile-file-path>#<share-name>.<schema-name>.<table-name>';
The LOCATION clause combines your profile file path with the fully qualified table name using # as a separator.

Example

-- Create a reference to a shared customer table
CREATE TABLE customers 
USING deltaSharing 
LOCATION '/opt/profiles/production.share#sales.crm.customers';

-- Now query it like any other table
SELECT * FROM customers LIMIT 10;

Querying Shared Tables

Once you’ve created a table reference, you can use standard SQL operations.

SELECT Queries

-- Select all columns
SELECT * FROM customers;

-- Select specific columns
SELECT customer_id, name, email FROM customers;

-- Select with aliases
SELECT 
  customer_id AS id,
  name AS customer_name,
  email AS contact_email
FROM customers;

Table Operations

Inspecting Table Structure

-- View table schema
DESCRIBE customers;

-- Detailed table information
DESCRIBE EXTENDED customers;

-- Show table properties
SHOW TBLPROPERTIES customers;

Working with Multiple Tables

-- Create multiple table references
CREATE TABLE customers 
USING deltaSharing 
LOCATION '/opt/profiles/prod.share#sales.crm.customers';

CREATE TABLE orders 
USING deltaSharing 
LOCATION '/opt/profiles/prod.share#sales.transactions.orders';

CREATE TABLE products 
USING deltaSharing 
LOCATION '/opt/profiles/prod.share#inventory.catalog.products';

Joins

Delta Sharing tables can be joined with other tables, including local tables and other shared tables.
-- Join two shared tables
SELECT 
  c.customer_id,
  c.name,
  o.order_id,
  o.order_date,
  o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01';

Advanced SQL Patterns

Subqueries

-- Subquery in WHERE clause
SELECT * FROM customers
WHERE customer_id IN (
  SELECT customer_id 
  FROM orders 
  WHERE total_amount > 1000
);

-- Subquery in FROM clause
SELECT 
  country,
  AVG(customer_value) as avg_value
FROM (
  SELECT 
    country,
    customer_id,
    SUM(total_amount) as customer_value
  FROM customers c
  JOIN orders o ON c.customer_id = o.customer_id
  GROUP BY country, customer_id
) customer_totals
GROUP BY country;

Common Table Expressions (CTEs)

-- Single CTE
WITH active_customers AS (
  SELECT * FROM customers
  WHERE status = 'active'
    AND last_order_date >= CURRENT_DATE - INTERVAL 90 DAYS
)
SELECT country, COUNT(*) as active_count
FROM active_customers
GROUP BY country;

-- Multiple CTEs
WITH 
customer_stats AS (
  SELECT 
    customer_id,
    COUNT(*) as order_count,
    SUM(total_amount) as total_spent
  FROM orders
  GROUP BY customer_id
),
high_value_customers AS (
  SELECT customer_id
  FROM customer_stats
  WHERE total_spent > 10000
)
SELECT 
  c.name,
  c.email,
  cs.order_count,
  cs.total_spent
FROM customers c
JOIN high_value_customers hvc ON c.customer_id = hvc.customer_id
JOIN customer_stats cs ON c.customer_id = cs.customer_id
ORDER BY cs.total_spent DESC;

Window Functions

-- Rank customers by order value within each country
SELECT 
  customer_id,
  name,
  country,
  total_purchases,
  RANK() OVER (PARTITION BY country ORDER BY total_purchases DESC) as country_rank
FROM customers
QUALIFY country_rank <= 10;

-- Running totals
SELECT 
  order_date,
  total_amount,
  SUM(total_amount) OVER (ORDER BY order_date) as running_total
FROM orders
ORDER BY order_date;

Table Path Variations

Delta Sharing supports various storage systems for profile files:
CREATE TABLE sales_data 
USING deltaSharing 
LOCATION '/home/user/profiles/sales.share#revenue.analytics.daily_sales';
When using cloud storage paths, ensure your Spark cluster has the necessary credentials configured to access the profile files.

Creating Views

Create views on top of shared tables for reusable query logic:
-- Create a view for active customers
CREATE VIEW active_customers AS
SELECT * FROM customers
WHERE status = 'active'
  AND last_order_date >= CURRENT_DATE - INTERVAL 90 DAYS;

-- Query the view
SELECT country, COUNT(*) FROM active_customers GROUP BY country;

-- Create a materialized result
CREATE TABLE customer_summary AS
SELECT 
  country,
  COUNT(*) as customer_count,
  AVG(total_purchases) as avg_purchases
FROM customers
GROUP BY country;

Performance Tips

-- Good: Select specific columns
SELECT customer_id, name, email FROM customers;

-- Avoid: SELECT * when you only need a few columns
SELECT * FROM customers;
Selecting only necessary columns reduces data transfer and improves query performance.
-- Apply filters as early as possible
SELECT COUNT(*) FROM customers
WHERE created_date >= '2024-01-01'
  AND country = 'USA';
Filtering reduces the amount of data processed in subsequent operations.
-- Cache a table for multiple queries
CACHE TABLE customers;

-- Run multiple queries (will use cached data)
SELECT COUNT(*) FROM customers;
SELECT country, COUNT(*) FROM customers GROUP BY country;

-- Clear cache when done
UNCACHE TABLE customers;
-- Use LIMIT when exploring data
SELECT * FROM customers LIMIT 100;

-- Sample data for analysis
SELECT * FROM customers TABLESAMPLE (10 PERCENT);

Temporary Tables

Create temporary table references that only exist for your session:
-- Create a temporary table reference
CREATE TEMP VIEW temp_customers 
USING deltaSharing 
AS SELECT * FROM deltaSharing.`/opt/profiles/prod.share#sales.crm.customers`;

-- Use in queries
SELECT * FROM temp_customers WHERE country = 'USA';

-- Temporary table is automatically dropped when session ends

Next Steps

Streaming

Stream changes from shared tables in real-time

Change Data Feed

Query historical changes to shared tables

Build docs developers (and LLMs) love