String Functions - Materialize

Materialize provides a comprehensive set of string functions for text manipulation, pattern matching, and analysis.

String Concatenation

Concatenation Operator (||)

Concatenate strings using the || operator:

SELECT 'Hello' || ' ' || 'World' AS greeting;
-- Result: Hello World

SELECT first_name || ' ' || last_name AS full_name
FROM users;

CONCAT Function

Concatenate multiple values, treating NULL as empty string:

SELECT CONCAT(first_name, ' ', middle_name, ' ', last_name) AS full_name
FROM users;
-- NULL values are treated as empty strings

Case Conversion

UPPER

Convert a string to uppercase:

SELECT UPPER('hello world') AS upper_text;
-- Result: HELLO WORLD

SELECT UPPER(email) FROM users;

LOWER

Convert a string to lowercase:

SELECT LOWER('HELLO WORLD') AS lower_text;
-- Result: hello world

SELECT LOWER(product_name) FROM products;

Substring and Extraction

SUBSTRING

Extract a substring from a string. Signatures:

substring(str, start_pos)
substring(str, start_pos, length)
substring(str FROM start_pos)
substring(str FOR length)
substring(str FROM start_pos FOR length)

Parameter	Type	Description
`str`	`text`	The source string
`start_pos`	`int`	Starting position (1-indexed)
`length`	`int`	Number of characters to extract

Return Type: text Examples:

SELECT substring('abcdefg', 3) AS substr;
-- Result: cdefg

SELECT substring('abcdefg', 3, 3) AS substr;
-- Result: cde

SELECT substring(email FROM '@(.*)' FOR '#') AS domain
FROM users;

String Length

LENGTH

Return the number of characters (code points) in a string.

Parameter	Type	Description
`str`	`text` or `bytea`	The string to measure
`encoding_name`	`text`	Optional encoding (defaults to UTF-8)

Return Type: int Examples:

SELECT length('hello') AS len;
-- Result: 5

SELECT length('你好') AS len;
-- Result: 2

SELECT length('你好', 'big5') AS len;
-- Result: 3

Character vs Byte Length

-- Character length (code points)
SELECT length('hello') AS char_len;
-- Result: 5

-- Byte length
SELECT octet_length('hello') AS byte_len;
-- Result: 5

SELECT length('你好') AS char_len, octet_length('你好') AS byte_len;
-- char_len: 2, byte_len: 6

String Trimming

TRIM, LTRIM, RTRIM

Remove leading and/or trailing characters:

-- Remove spaces from both ends
SELECT trim('  hello  ') AS trimmed;
-- Result: hello

-- Remove from left (leading)
SELECT ltrim('  hello  ') AS left_trimmed;
-- Result: 'hello  '

-- Remove from right (trailing)
SELECT rtrim('  hello  ') AS right_trimmed;
-- Result: '  hello'

-- Remove specific characters
SELECT trim(BOTH 'x' FROM 'xxxhelloxxx') AS trimmed;
-- Result: hello

Pattern Matching

LIKE and ILIKE

Pattern matching with wildcards:

% matches any sequence of characters
_ matches any single character

-- Case-sensitive LIKE
SELECT * FROM products
WHERE name LIKE 'Apple%';

-- Case-insensitive ILIKE
SELECT * FROM products
WHERE name ILIKE '%apple%';

-- Escape special characters
SELECT * FROM files
WHERE filename LIKE '%\_%' ESCAPE '\\';

Operators:

LIKE / ~~: Case-sensitive matching
ILIKE / ~~*: Case-insensitive matching
NOT LIKE / !~~: Negated case-sensitive
NOT ILIKE / !~~*: Negated case-insensitive

Regular Expressions

Materialize supports regular expressions using Rust regex crate syntax. Operators:

~: Case-sensitive regex match
~*: Case-insensitive regex match
!~: Case-sensitive regex non-match
!~*: Case-insensitive regex non-match

Examples:

-- Match email addresses
SELECT * FROM users
WHERE email ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';

-- Case-insensitive match
SELECT * FROM products
WHERE description ~* 'organic|natural';

-- Extract with regex (requires regex functions)
SELECT regexp_match(text, 'pattern') FROM logs;

Materialize regular expressions follow Rust regex syntax, which is similar to but not identical to PostgreSQL regex syntax.

Limitations:

Maximum regex size: 1 MiB raw, 10 MiB compiled
No backreferences
No look-ahead or look-behind assertions

String Replacement

REPLACE

Replace all occurrences of a substring:

SELECT replace('Hello World', 'World', 'Universe') AS replaced;
-- Result: Hello Universe

SELECT replace(description, 'old_term', 'new_term')
FROM articles;

TRANSLATE

Replace characters based on a mapping:

SELECT translate('hello', 'el', 'ip') AS translated;
-- Result: hippo
-- 'e' -> 'i', 'l' -> 'p'

String Positioning

POSITION

Find the position of a substring:

SELECT position('World' IN 'Hello World') AS pos;
-- Result: 7 (1-indexed)

SELECT position('@' IN email) AS at_position
FROM users;

STRPOS

Alternative syntax for finding substring position:

SELECT strpos('Hello World', 'World') AS pos;
-- Result: 7

String Splitting and Parsing

SPLIT_PART

Split a string and return a specific part:

SELECT split_part('one,two,three', ',', 2) AS part;
-- Result: two

SELECT 
    split_part(email, '@', 1) AS username,
    split_part(email, '@', 2) AS domain
FROM users;

REGEXP_SPLIT_TO_ARRAY

Split a string into an array using a regex pattern:

SELECT regexp_split_to_array('one,two,,three', ',+') AS parts;
-- Result: ['one', 'two', 'three']

Unicode Normalization

normalize

Convert a string to a specified Unicode normalization form.

Parameter	Type	Description
`str`	`text`	The string to normalize
`form`	keyword	Normalization form: `NFC`, `NFD`, `NFKC`, or `NFKD` (default: `NFC`)

Return Type: text Normalization Forms:

NFC: Canonical Composition (default, most common)
NFD: Canonical Decomposition
NFKC: Compatibility Composition
NFKD: Compatibility Decomposition

Examples:

-- Default NFC normalization
SELECT normalize('café') AS normalized;

-- Explicit NFD form
SELECT normalize('café', NFD) AS normalized;

-- Compatibility composition
SELECT normalize('①②③', NFKC) AS normalized;
-- Converts special number forms to standard digits

Encoding and Decoding

ENCODE and DECODE

Convert between binary data and text representations:

-- Encode binary data as base64
SELECT encode('hello'::bytea, 'base64') AS encoded;
-- Result: aGVsbG8=

-- Decode base64 to binary
SELECT decode('aGVsbG8=', 'base64')::text AS decoded;
-- Result: hello

Supported formats:

base64
hex
escape

String Operators

Operator	Description	Example
`\|\|`	Concatenation	`'Hello' \|\| ' World'`
`~`	Regex match (case-sensitive)	`text ~ 'pattern'`
`~*`	Regex match (case-insensitive)	`text ~* 'pattern'`
`!~`	Regex non-match (case-sensitive)	`text !~ 'pattern'`
`!~*`	Regex non-match (case-insensitive)	`text !~* 'pattern'`
`~~`	LIKE (case-sensitive)	`text ~~ 'pat%'`
`~~*`	ILIKE (case-insensitive)	`text ~~* 'pat%'`
`!~~`	NOT LIKE	`text !~~ 'pat%'`
`!~~*`	NOT ILIKE	`text !~~* 'pat%'`

Common String Patterns

Email Validation

SELECT email
FROM users
WHERE email ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';

URL Parsing

SELECT 
    url,
    split_part(url, '://', 1) AS protocol,
    split_part(split_part(url, '://', 2), '/', 1) AS domain,
    substring(url FROM position('://' IN url) + 3) AS path
FROM links;

Name Formatting

SELECT 
    UPPER(substring(first_name, 1, 1)) || 
    LOWER(substring(first_name, 2)) || ' ' ||
    UPPER(substring(last_name, 1, 1)) || 
    LOWER(substring(last_name, 2)) AS formatted_name
FROM users;

SQL Overview

DDL

DML

Functions

System Catalog

​String Concatenation

​Concatenation Operator (||)

​CONCAT Function

​Case Conversion

​UPPER

​LOWER

​Substring and Extraction

​SUBSTRING

​String Length

​LENGTH

​Character vs Byte Length

​String Trimming

​TRIM, LTRIM, RTRIM

​Pattern Matching

​LIKE and ILIKE

​Regular Expressions

​String Replacement

​REPLACE

​TRANSLATE

​String Positioning

​POSITION

​STRPOS

​String Splitting and Parsing

​SPLIT_PART

​REGEXP_SPLIT_TO_ARRAY

​Unicode Normalization

​normalize

​Encoding and Decoding

​ENCODE and DECODE

​String Operators

​Common String Patterns

​Email Validation

​URL Parsing

​Name Formatting

​Related Resources

Build docs developers (and LLMs) love

String Concatenation

Concatenation Operator (||)

CONCAT Function

Case Conversion

UPPER

LOWER

Substring and Extraction

SUBSTRING

String Length

LENGTH

Character vs Byte Length

String Trimming

TRIM, LTRIM, RTRIM

Pattern Matching

LIKE and ILIKE

Regular Expressions

String Replacement

REPLACE

TRANSLATE

String Positioning

POSITION

STRPOS

String Splitting and Parsing

SPLIT_PART

REGEXP_SPLIT_TO_ARRAY

Unicode Normalization

normalize

Encoding and Decoding

ENCODE and DECODE

String Operators

Common String Patterns

Email Validation

URL Parsing

Name Formatting

Related Resources