Skip to main content
Delta Sharing supports filtering data at the server side to reduce data transfer and improve query performance. Clients send filtering hints that servers apply on a best-effort basis.
Best-Effort Filtering: Servers may return files that don’t satisfy the predicates. Clients must always apply predicates to filter the data returned by the server.

Overview

Filtering in Delta Sharing operates as hints rather than guarantees:
  • Server behavior: Attempts to filter files using predicates, but may return additional files
  • Client responsibility: Must always apply predicates to returned data
  • Failure handling: If the server fails to parse or evaluate a predicate, it may skip it and return all files
Filtering is particularly effective for partitioned tables where predicates match partition columns, enabling efficient file pruning.

SQL Expressions for Filtering

SQL expressions provide a familiar way to express filtering predicates. They are sent as an array of strings in the request body.

Request Format

{
  "predicateHints": [
    "date >= '2021-01-01'",
    "date <= '2021-01-31'",
    "country = 'US'"
  ],
  "limitHint": 1000
}
predicateHints
array[string]
Array of SQL boolean expressions. Expressions are AND-ed together (conjunctive).
limitHint
integer
Hint indicating how many rows the client plans to read. Server may use this with file statistics to return fewer files.
Deprecation Notice: predicateHints is being deprecated in favor of jsonPredicateHints once all implementations migrate to JSON-based filtering.

Supported Operators

The following comparison operators are supported:
OperatorDescriptionExample
=Equalitycol = 123
<>Inequalitycol <> 'foo'
>Greater thancol > 'foo'
<Less than'foo' < col
>=Greater or equalcol >= 123
<=Less or equal123 <= col
IS NULLNull checkcol IS NULL
IS NOT NULLNot null checkcol IS NOT NULL

Expression Examples

{
  "predicateHints": [
    "date >= '2021-01-01'",
    "date <= '2021-12-31'"
  ]
}
Limitations:
  • Only simple column-to-constant comparisons are supported
  • No support for OR, IN, LIKE, or complex expressions
  • Columns must be compared to literal values, not other columns
  • More operators will be added incrementally in future releases

JSON Predicates for Filtering

JSON predicates provide a structured, type-safe format for expressing filtering logic. This is the preferred method for partition column predicates.

Request Format

{
  "jsonPredicateHints": "{\"op\":\"and\",\"children\":[{\"op\":\"equal\",\"children\":[{\"op\":\"column\",\"name\":\"date\",\"valueType\":\"date\"},{\"op\":\"literal\",\"value\":\"2021-04-29\",\"valueType\":\"date\"}]},{\"op\":\"lessThan\",\"children\":[{\"op\":\"column\",\"name\":\"id\",\"valueType\":\"int\"},{\"op\":\"literal\",\"value\":\"25\",\"valueType\":\"int\"}]}]}"
}
jsonPredicateHints
string
Serialized JSON string representing a predicate tree. Parse to get the predicate object structure.

Predicate Structure

Each predicate is a JSON object with these fields:
FieldTypeDescription
opstringOperation name (see Supported Operations)
childrenarrayChild operations (empty for leaf operations)
namestringColumn name (for column operations only)
valuestringLiteral value (for literal operations only)
valueTypestringData type (for column and literal operations)

Supported Operations

columnRepresents a column reference.
{
  "op": "column",
  "name": "hireDate",
  "valueType": "date"
}
literalRepresents a constant value.
{
  "op": "literal",
  "value": "2021-04-29",
  "valueType": "date"
}
isNullChecks if a column is null.
{
  "op": "isNull",
  "children": [
    {"op": "column", "name": "email", "valueType": "string"}
  ]
}
notLogical NOT operation.
{
  "op": "not",
  "children": [
    {
      "op": "isNull",
      "children": [{"op": "column", "name": "id", "valueType": "int"}]
    }
  ]
}
equalEquality check (=).
{
  "op": "equal",
  "children": [
    {"op": "column", "name": "country", "valueType": "string"},
    {"op": "literal", "value": "US", "valueType": "string"}
  ]
}
lessThanLess than check (<).
{
  "op": "lessThan",
  "children": [
    {"op": "column", "name": "age", "valueType": "int"},
    {"op": "literal", "value": "65", "valueType": "int"}
  ]
}
lessThanOrEqualLess than or equal check (<=).greaterThanGreater than check (>).greaterThanOrEqualGreater than or equal check (>=).
andLogical AND operation (requires at least 2 children).
{
  "op": "and",
  "children": [
    {
      "op": "equal",
      "children": [
        {"op": "column", "name": "status", "valueType": "string"},
        {"op": "literal", "value": "active", "valueType": "string"}
      ]
    },
    {
      "op": "greaterThan",
      "children": [
        {"op": "column", "name": "balance", "valueType": "double"},
        {"op": "literal", "value": "0", "valueType": "double"}
      ]
    }
  ]
}
orLogical OR operation (requires at least 2 children).
{
  "op": "or",
  "children": [
    {
      "op": "equal",
      "children": [
        {"op": "column", "name": "priority", "valueType": "string"},
        {"op": "literal", "value": "high", "valueType": "string"}
      ]
    },
    {
      "op": "equal",
      "children": [
        {"op": "column", "name": "priority", "valueType": "string"},
        {"op": "literal", "value": "urgent", "valueType": "string"}
      ]
    }
  ]
}

Supported Value Types

These value types are supported in valueType fields:
TypeDescriptionExample Value
boolBoolean"true" or "false"
intInteger"42"
longLong integer"1234567890"
stringString"hello"
dateDate"2021-04-29" (yyyy-mm-dd)
floatFloat"3.14"
doubleDouble"3.14159"
timestampTimestamp"2022-01-01T00:00:00Z" (ISO 8601)
All value types are case-insensitive when processed by the server.

Complete Examples

Example 1: Simple Equality

Find records where hire date equals a specific date.
{
  "op": "equal",
  "children": [
    {"op": "column", "name": "hireDate", "valueType": "date"},
    {"op": "literal", "value": "2021-04-29", "valueType": "date"}
  ]
}

Example 2: Compound AND Predicate

Find records where hire date equals a date AND id is less than 25.
{
  "op": "and",
  "children": [
    {
      "op": "equal",
      "children": [
        {"op": "column", "name": "hireDate", "valueType": "date"},
        {"op": "literal", "value": "2021-04-29", "valueType": "date"}
      ]
    },
    {
      "op": "lessThan",
      "children": [
        {"op": "column", "name": "id", "valueType": "int"},
        {"op": "literal", "value": "25", "valueType": "int"}
      ]
    }
  ]
}

Example 3: NOT NULL Check

Find records where id is not null.
{
  "op": "not",
  "children": [
    {
      "op": "isNull",
      "children": [
        {"op": "column", "name": "id", "valueType": "int"}
      ]
    }
  ]
}

Example 4: Date Range with OR

Find records from Q1 2021 or Q4 2021.
{
  "op": "or",
  "children": [
    {
      "op": "and",
      "children": [
        {
          "op": "greaterThanOrEqual",
          "children": [
            {"op": "column", "name": "date", "valueType": "date"},
            {"op": "literal", "value": "2021-01-01", "valueType": "date"}
          ]
        },
        {
          "op": "lessThan",
          "children": [
            {"op": "column", "name": "date", "valueType": "date"},
            {"op": "literal", "value": "2021-04-01", "valueType": "date"}
          ]
        }
      ]
    },
    {
      "op": "and",
      "children": [
        {
          "op": "greaterThanOrEqual",
          "children": [
            {"op": "column", "name": "date", "valueType": "date"},
            {"op": "literal", "value": "2021-10-01", "valueType": "date"}
          ]
        },
        {
          "op": "lessThan",
          "children": [
            {"op": "column", "name": "date", "valueType": "date"},
            {"op": "literal", "value": "2022-01-01", "valueType": "date"}
          ]
        }
      ]
    }
  ]
}

Combining with Limit Hints

When both predicateHints/jsonPredicateHints and limitHint are present:
  1. Server applies predicate filtering first
  2. Server then uses limit hint with file statistics to return fewer files
  3. Client must still apply both predicates and limit to the returned data
{
  "jsonPredicateHints": "{\"op\":\"equal\",\"children\":[{\"op\":\"column\",\"name\":\"status\",\"valueType\":\"string\"},{\"op\":\"literal\",\"value\":\"active\",\"valueType\":\"string\"}]}",
  "limitHint": 1000
}
This combination is particularly effective for queries like SELECT * FROM table WHERE status='active' LIMIT 1000, where the server can use both hints for optimal file selection.

Best Practices

Use JSON Predicates

Prefer JSON predicates over SQL expressions for type safety and easier server implementation.

Filter on Partitions

Predicates on partition columns are most effective, enabling efficient file pruning.

Always Filter Client-Side

Never rely solely on server filtering. Always apply predicates to returned data.

Use Limit Hints

Combine filtering with limit hints to minimize data transfer for top-N queries.

Migration Path

1

Current State

Using predicateHints with SQL expressions
2

Transition

Support both predicateHints and jsonPredicateHints for backward compatibility
3

Future State

Use only jsonPredicateHints once all implementations migrate
The predicateHints field will be deprecated once all Delta Sharing clients and servers have migrated to JSON-based filtering.

Next Steps

REST APIs

Learn about Query Table API

Response Format

Understand file statistics

Build docs developers (and LLMs) love