Schema Management

Overview

Meridian provides powerful utilities for creating and managing DuckDB tables from various data sources. The schema management system automatically infers data types, handles nested structures, and integrates with cloud storage.

Table Creation Methods

Create Table from CSV

Import CSV files from cloud storage (Cloudflare R2) and automatically create DuckDB tables with inferred schemas.

Function Signature

createTableFromCSV({ csvUrl, tableName })

Parameters:

csvUrl (string): URL to the CSV file (e.g., Cloudflare R2 URL)
tableName (string): Desired table name (will be sanitized)

Response

{
  success: true,
  tableName: "sanitized_table_name",
  rowCount: 1000
}

Usage Example

import { createTableFromCSV } from '~/utils/duckdb'

const result = await createTableFromCSV({
  csvUrl: 'https://r2.example.com/data/customers.csv',
  tableName: 'customers'
})

console.log(`Created table ${result.tableName} with ${result.rowCount} rows`)

Implementation Details

Source: src/utils/duckdb.ts:80 The function:

Fetches CSV content from the provided URL
Writes content to a temporary file in /tmp
Uses DuckDB’s read_csv_auto() for automatic schema detection
Creates the table and imports all data
Cleans up temporary files automatically

CSV Schema Detection

DuckDB’s read_csv_auto() automatically detects:

Column names from header row
Data types (INTEGER, DOUBLE, VARCHAR, DATE, etc.)
Delimiter characters (comma, tab, pipe, etc.)
Quote characters and escape sequences
Null value representations

Create Table from JSON

Import JSON data directly into DuckDB with automatic type inference and nested structure flattening.

Function Signature

createTableFromJSON({ data, tableName })

Parameters:

data (array): Array of JSON objects to import
tableName (string): Desired table name (will be sanitized)

Response

{
  success: true,
  tableName: "sanitized_table_name",
  rowCount: 500,
  columnCount: 12
}

Usage Example

import { createTableFromJSON } from '~/utils/duckdb'

const userData = [
  { id: 1, name: 'Alice', email: '[email protected]', active: true },
  { id: 2, name: 'Bob', email: '[email protected]', active: false }
]

const result = await createTableFromJSON({
  data: userData,
  tableName: 'users'
})

Nested JSON Flattening

Source: src/utils/duckdb.ts:281 The function automatically flattens nested JSON structures: Input:

[
  {
    "order_id": 1,
    "customer": "John Doe",
    "items": [
      { "product": "Widget", "price": 10.99 },
      { "product": "Gadget", "price": 24.99 }
    ]
  }
]

Output (2 rows):

[
  { "order_id": 1, "customer": "John Doe", "product": "Widget", "price": 10.99 },
  { "order_id": 1, "customer": "John Doe", "product": "Gadget", "price": 24.99 }
]

Arrays of objects are expanded into separate rows, preserving parent-level fields.

Type Inference

Automatic Type Detection

Source: src/utils/duckdb.ts:201 The system automatically infers DuckDB types from JavaScript values:

JavaScript Type	DuckDB Type	Notes
`boolean`	`BOOLEAN`	Direct mapping
`number` (integer)	`BIGINT`	For whole numbers
`number` (float)	`DOUBLE`	For decimal numbers
`string`	`VARCHAR`	Default text type
`Date`	`DATE`	Date objects
`object`	`VARCHAR`	Stored as JSON string
`null` / `undefined`	`VARCHAR`	Default for nullable columns

Type Determination Algorithm

Source: src/utils/duckdb.ts:229 For each column, the system:

Collects all non-null values
Checks if all values are the same JavaScript type
If mixed types, defaults to VARCHAR
For consistent types, infers the most appropriate DuckDB type
Objects and arrays are stored as JSON strings in VARCHAR columns

Table Name Sanitization

Source: src/utils/duckdb.ts:104, src/utils/duckdb.ts:380 Table names are automatically sanitized to ensure SQL compatibility:

// Invalid characters replaced with underscore
'my-table-name' → 'my_table_name'
'users@2024' → 'users_2024'
'table#1' → 'table_1'

Sanitization Rules

Only alphanumeric characters and underscores are allowed
All other characters are replaced with _
Pattern: /[^a-zA-Z0-9_]/g

Column Name Sanitization

Source: src/utils/duckdb.ts:255 Column names are sanitized using similar rules:

'user-id' → 'user_id'
'email@domain' → 'email_domain'
'123column' → '_123column' // Numbers prefix gets underscore

Sanitization Rules

Invalid characters replaced with underscore
Column names starting with numbers get _ prefix
Pattern: /[^a-zA-Z0-9_]/g and /^[0-9]/

SQL Value Escaping

Source: src/utils/duckdb.ts:261 The system safely escapes values for SQL insertion:

Boolean Values

true → TRUE
false → FALSE

Numeric Values

42 → 42
3.14159 → 3.14159

String Values

"Hello" → 'Hello'
"O'Brien" → 'O''Brien' -- Single quotes escaped

Null Values

null → NULL
undefined → NULL

Object Values

{ "key": "value" } → '{"key":"value"}'

Error Handling

Common Errors

Empty Data Array

throw new Error('JSON data must be a non-empty array')

No Columns Found

throw new Error('No columns found in JSON data')

CSV Fetch Failure

throw new Error('Failed to fetch CSV: 404 Not Found')

Cleanup and Resource Management

Source: src/utils/duckdb.ts:138 Temporary files are automatically cleaned up:

finally {
  if (tempFilePath) {
    try {
      unlinkSync(tempFilePath)
    } catch (cleanupError) {
      console.error('Error cleaning up temp file:', cleanupError)
    }
  }
}

Table Operations

Drop and Recreate

Both functions automatically drop existing tables before creation:

DROP TABLE IF EXISTS table_name

This ensures:

Fresh schema on each import
No conflicts with existing tables
Clean data without duplicates

Row Count Verification

After table creation, the system verifies the import:

SELECT COUNT(*) as row_count FROM table_name

This confirms:

All data was imported successfully
Table is queryable
No errors during import

Integration with MotherDuck

Source: src/utils/duckdb.ts:18 All table operations use the shared DuckDB instance:

const db = await getDuckDB()
const connection = await db.connect()

Connection String

`md:my_db?token=${encodedToken}`

md: - MotherDuck protocol
my_db - Database name
token - Encoded access token from MD_ACCESS_TOKEN environment variable

Environment Setup

Source: src/utils/duckdb.ts:14

process.env.HOME = '/tmp'

Required for serverless environments where home directory may not be writable.

Best Practices

CSV Imports

Use stable, publicly accessible URLs
Ensure CSV files have header rows
Validate CSV structure before import
Monitor file sizes for performance

JSON Imports

Validate JSON structure before import
Be aware of nested array expansion
Consider pre-flattening complex structures
Monitor memory usage for large datasets

Schema Design

Use descriptive table names
Avoid special characters in column names
Consider type casting for mixed-type columns
Plan for null values in your schema

Performance

Batch multiple inserts when possible
Create indexes on frequently queried columns
Use appropriate data types to minimize storage
Consider partitioning for very large tables

Backend API

Agent Tools

DuckDB Integration

​Overview

​Table Creation Methods

​Create Table from CSV

​Function Signature

​Response

​Usage Example

​Implementation Details

​CSV Schema Detection

​Create Table from JSON

​Function Signature

​Response

​Usage Example

​Nested JSON Flattening

​Type Inference

​Automatic Type Detection

​Type Determination Algorithm

​Table Name Sanitization

​Sanitization Rules

​Column Name Sanitization

​Sanitization Rules

​SQL Value Escaping

​Boolean Values

​Numeric Values

​String Values

​Null Values

​Object Values

​Error Handling

​Common Errors

​Empty Data Array

​No Columns Found

​CSV Fetch Failure

​Cleanup and Resource Management

​Table Operations

​Drop and Recreate

​Row Count Verification

​Integration with MotherDuck

​Connection String

​Environment Setup

​Best Practices

​CSV Imports

​JSON Imports

​Schema Design

​Performance

Build docs developers (and LLMs) love

Overview

Table Creation Methods

Create Table from CSV

Function Signature

Response

Usage Example

Implementation Details

CSV Schema Detection

Create Table from JSON

Function Signature

Response

Usage Example

Nested JSON Flattening

Type Inference

Automatic Type Detection

Type Determination Algorithm

Table Name Sanitization

Sanitization Rules

Column Name Sanitization

Sanitization Rules

SQL Value Escaping

Boolean Values

Numeric Values

String Values

Null Values

Object Values

Error Handling

Common Errors

Empty Data Array

No Columns Found

CSV Fetch Failure

Cleanup and Resource Management

Table Operations

Drop and Recreate

Row Count Verification

Integration with MotherDuck

Connection String

Environment Setup

Best Practices

CSV Imports

JSON Imports

Schema Design

Performance