Skip to main content

Overview

This guide provides recommendations for working with common data types in Metadb. Following these practices ensures consistency, accuracy, and compatibility across your analytics environment.

Text and Character Data

Use TEXT for Strings

The TEXT data type is recommended for strings of characters without length constraints.
CREATE TABLE maria.books (
    title TEXT,
    author TEXT,
    description TEXT
);
PostgreSQL’s TEXT type has no performance penalty compared to VARCHAR and provides maximum flexibility.

VARCHAR for Legacy Compatibility

Use VARCHAR(n) only when working with older database clients that require explicit length specifications:
CREATE TABLE legacy_app.users (
    username VARCHAR(50),
    email VARCHAR(255)
);

Comparison

TypeUse CaseLength Limit
TEXTRecommended for all new developmentUnlimited (up to 1GB)
VARCHAR(n)Legacy applications requiring fixed lengthSpecified by n
VARCHAREquivalent to TEXTUnlimited
CHAR(n)Fixed-length fields (avoid if possible)Exactly n characters
TEXT, VARCHAR, and CHAR all use the same internal storage mechanism in PostgreSQL, so there’s no performance difference.

Numeric and Monetary Data

Monetary Amounts

For monetary values, use NUMERIC(19, 4) to avoid floating-point rounding errors:
CREATE TABLE finance.transactions (
    transaction_id UUID PRIMARY KEY,
    amount NUMERIC(19, 4),
    currency_code CHAR(3)
);
Why NUMERIC(19, 4)?
  • 19 total digits (sufficient for large amounts)
  • 4 decimal places (handles sub-cent precision)
  • Exact decimal representation (no rounding errors)

Exchange Rates

For currency exchange rates, use NUMERIC(19, 14) for higher precision:
CREATE TABLE finance.exchange_rates (
    from_currency CHAR(3),
    to_currency CHAR(3),
    rate NUMERIC(19, 14),
    effective_date DATE
);

Example: Calculating with Monetary Data

SELECT 
    transaction_id,
    amount,
    amount * 1.0825 AS amount_with_tax,
    ROUND(amount * 1.0825, 2) AS amount_with_tax_rounded
FROM finance.transactions;
Never use REAL, FLOAT, or DOUBLE PRECISION for monetary data. These floating-point types introduce rounding errors that are unacceptable for financial calculations.

Numeric Type Comparison

TypePrecisionUse Case
NUMERIC(19, 4)Exact, 4 decimal placesMonetary amounts
NUMERIC(19, 14)Exact, 14 decimal placesExchange rates
INTEGERExact, no decimalsCounts, IDs
BIGINTExact, larger rangeLarge counts
REALApproximate, 6 decimal digits❌ Avoid for money
DOUBLE PRECISIONApproximate, 15 decimal digits❌ Avoid for money

Date and Time Data

Querying Date Ranges

The most reliable method for querying within date ranges uses the inequality pattern start <= t < end:
SELECT *
FROM folio_circulation.loan__t
WHERE '2023-01-01' <= loan_date AND loan_date < '2024-01-01';
1

Use inclusive start with <=

The start date is included: '2023-01-01' <= loan_date
2

Use exclusive end with <

The end date is excluded: loan_date < '2024-01-01'
3

Combine with AND

Both conditions together: WHERE start <= date AND date < end
Why this pattern?
  • Handles timestamps precisely without worrying about time components
  • Prevents off-by-one errors at boundaries
  • Works correctly with DATE and TIMESTAMP columns
  • Aligns with interval semantics [start, end)

Date Range Examples

-- All loans in 2023
SELECT *
FROM folio_circulation.loan__t
WHERE '2023-01-01' <= loan_date AND loan_date < '2024-01-01';

Avoid Common Date Pitfalls

Don’t use BETWEEN for date ranges with timestamps:
-- ❌ Problematic: Includes all of 2024-01-01
WHERE loan_date BETWEEN '2023-01-01' AND '2024-01-01'

-- ✅ Correct: Excludes 2024-01-01
WHERE '2023-01-01' <= loan_date AND loan_date < '2024-01-01'
BETWEEN is inclusive on both ends, which can include timestamps from the excluded day.

Date Type Recommendations

TypeUse CaseExample
DATECalendar dates without time2023-01-01
TIMESTAMPDate and time without timezone2023-01-01 14:30:00
TIMESTAMPTZDate and time with timezone (recommended)2023-01-01 14:30:00+00
TIMETime of day without date14:30:00
INTERVALDuration or time span'2 days 3 hours'
Prefer TIMESTAMPTZ (timestamp with time zone) over TIMESTAMP for columns that track when events occurred. This ensures correct handling across time zones.

UUID (Universally Unique Identifiers)

Metadb uses UUIDs extensively for identifiers.

Working with UUIDs

-- UUIDs in WHERE clauses
SELECT *
FROM library.patron
WHERE id = 'a1b2c3d4-e5f6-4789-0abc-def123456789'::UUID;

-- Generating UUIDs
INSERT INTO maria.custom_items (id, name)
VALUES (gen_random_uuid(), 'New Item');
The __id column in Metadb tables is a BIGINT surrogate key, not a UUID. Don’t confuse it with id columns from source data, which may be UUIDs.

Boolean Data

PostgreSQL supports native boolean types:
-- Querying boolean columns
SELECT *
FROM library.patrongroup__
WHERE __current;  -- Equivalent to: WHERE __current = true

-- Boolean expressions
SELECT 
    id,
    groupname,
    __current AS is_current,
    NOT __current AS is_historical
FROM library.patrongroup__;

Boolean Values

LiteralTypeNotes
TRUE, true, 't', 'true', 'y', 'yes', '1'True
FALSE, false, 'f', 'false', 'n', 'no', '0'False
NULLUnknownThree-valued logic

JSON and JSONB

Metadb often works with JSON data from source systems.

JSONB vs JSON

Always use JSONB (binary JSON) rather than JSON:
FeatureJSONBJSON
StorageBinary, decomposedText, exact copy
IndexingSupportedLimited
Query performanceFastSlower
Whitespace preservationNoYes
Key order preservationNoYes
Use JSONB unless you specifically need to preserve exact formatting (rare in analytics).

Querying JSON Data

-- Extract text value
SELECT jsonb_extract_path_text(jsondata, 'groupname') AS groupname
FROM library.patrongroup;

-- Extract JSON object
SELECT jsondata -> 'address' AS address
FROM library.patron;

-- Extract and cast
SELECT (jsondata ->> 'age')::INTEGER AS age
FROM library.patron;

-- Query nested values
SELECT jsondata -> 'address' ->> 'city' AS city
FROM library.patron;

Consider Transformed Tables

Instead of repeatedly extracting JSON fields, use transformed tables for better performance and simpler queries.

Array Types

PostgreSQL supports array types for any data type:
CREATE TABLE maria.reading_lists (
    list_id INTEGER,
    patron_id UUID,
    book_ids UUID[],
    tags TEXT[]
);

-- Querying arrays
SELECT *
FROM maria.reading_lists
WHERE 'fiction' = ANY(tags);

-- Array aggregation
SELECT 
    patron_id,
    array_agg(book_id) AS all_books
FROM maria.reading_history
GROUP BY patron_id;

Best Practices Summary

Text Data

Use TEXT for strings; VARCHAR(n) only for legacy compatibility

Money

Use NUMERIC(19, 4) for amounts; NUMERIC(19, 14) for exchange rates

Dates

Use start <= date AND date < end pattern for ranges

JSON

Prefer JSONB over JSON; consider transformed tables
Never use floating-point types (REAL, FLOAT, DOUBLE PRECISION) for monetary data. Always use NUMERIC with appropriate precision.

Additional Resources

Build docs developers (and LLMs) love