CSV Files

Installation

Install the CSV connector:

npm install @evidence-dev/csv

Configuration

connection.yaml

name: csv
type: csv
options:
  options: ""

The CSV connector uses DuckDB’s read_csv function under the hood, which automatically detects file structure and types.

Configuration Parameters

options

string

DuckDB CSV options string (see DuckDB CSV documentation)

How It Works

The CSV connector:

Looks for .csv files in your sources directory
Automatically loads them using DuckDB’s CSV reader
Applies auto-detection for delimiters, headers, and data types
Makes the data queryable via SQL

File Placement

Place CSV files in your sources directory:

your-project/
  sources/
    my_connection/
      connection.yaml
      sales_data.csv
      customers.csv
      queries/
        *.sql

Example Queries

Query a CSV File

Create a .sql file that references your CSV:

queries/sales_summary.sql

SELECT 
  DATE_TRUNC('month', order_date) as month,
  SUM(amount) as total_revenue,
  COUNT(*) as order_count
FROM 'sales_data.csv'
GROUP BY 1
ORDER BY 1 DESC

Join Multiple CSV Files

queries/customer_orders.sql

SELECT 
  c.customer_name,
  c.segment,
  COUNT(o.order_id) as order_count,
  SUM(o.amount) as total_spent
FROM 'customers.csv' c
LEFT JOIN 'orders.csv' o
  ON c.customer_id = o.customer_id
GROUP BY 1, 2
ORDER BY 4 DESC

Filter and Transform

queries/recent_orders.sql

SELECT 
  order_id,
  customer_id,
  order_date,
  amount,
  CASE 
    WHEN amount >= 1000 THEN 'High Value'
    WHEN amount >= 500 THEN 'Medium Value'
    ELSE 'Low Value'
  END as order_category
FROM 'orders.csv'
WHERE order_date >= CURRENT_DATE - INTERVAL 30 DAYS
ORDER BY order_date DESC

CSV Options

Customize CSV parsing using the options parameter:

connection.yaml

name: csv
type: csv
options:
  options: "delim=';', header=true, quote='\"'"

Common Options

delim=',' - Column delimiter (default: auto-detect)
header=true - First row contains headers (default: auto-detect)
quote='"' - Quote character (default: ")
escape='"' - Escape character (default: ")
skip=0 - Number of lines to skip at the beginning
dateformat='%Y-%m-%d' - Date format string
timestampformat='%Y-%m-%d %H:%M:%S' - Timestamp format
nullstr='NULL' - String representing NULL values

Example with Custom Options

connection.yaml

name: csv_custom
type: csv
options:
  options: "delim='|', header=true, nullstr='N/A', dateformat='%d/%m/%Y'"

In most cases, you can leave options empty. DuckDB’s auto-detection is very good at identifying the correct format.

Advanced Features

Wildcard Patterns

Query multiple files at once:

queries/all_regions.sql

SELECT 
  region,
  SUM(sales) as total_sales
FROM 'sales_*.csv'
GROUP BY region

Glob Patterns

Use glob patterns for flexible file matching:

queries/yearly_data.sql

SELECT *
FROM 'data/year=*/month=*/*.csv'
WHERE year >= 2023

Column Selection

Select specific columns:

queries/subset.sql

SELECT 
  order_id,
  customer_id,
  amount
FROM 'large_dataset.csv'
LIMIT 1000

Type Casting

Explicitly cast column types:

queries/with_types.sql

SELECT 
  order_id::INTEGER,
  order_date::DATE,
  amount::DECIMAL(10,2)
FROM 'orders.csv'

CSV File Best Practices

File Format

Use UTF-8 encoding
Include headers in the first row
Use consistent date formats (ISO 8601 recommended: YYYY-MM-DD)
Quote fields containing delimiters or newlines

Example Well-Formatted CSV

sales_data.csv

order_id,order_date,customer_id,product,quantity,amount
1001,2024-01-15,C001,Widget A,5,149.99
1002,2024-01-15,C002,Widget B,2,79.98
1003,2024-01-16,C001,Widget C,1,299.99

Optimize for Performance

Keep files under 100MB for fastest loading
For larger datasets, split into multiple files
Use consistent column types across files
Remove unnecessary columns

Type Detection

DuckDB automatically detects column types:

Numbers: Integers and decimals
Dates: ISO 8601 formats (YYYY-MM-DD)
Timestamps: ISO 8601 with time (YYYY-MM-DD HH:MM:SS)
Strings: Any non-numeric data
Booleans: true/false, t/f, yes/no, 1/0

Override Type Detection

If auto-detection fails, cast explicitly in queries:

SELECT 
  id::INTEGER,
  created_date::DATE,
  revenue::DECIMAL(10,2)
FROM 'data.csv'

Troubleshooting

File not found errors

Ensure CSV files are in your sources directory
Check file names match exactly (case-sensitive)
Use relative paths from the sources directory

Type detection errors

Check for inconsistent data types in columns
Look for non-numeric values in numeric columns

Use explicit type casting in queries:

SELECT column_name::INTEGER FROM 'file.csv'

Delimiter detection issues

Specify the delimiter explicitly:
```
options: "delim=';'"
```
Common delimiters: , (comma), ; (semicolon), \t (tab), | (pipe)

Date parsing errors

Use ISO 8601 format (YYYY-MM-DD) when possible
Specify date format if using different format:
```
options: "dateformat='%d/%m/%Y'"
```

Memory errors with large files

Split large files into smaller chunks
Select only needed columns:
```
SELECT col1, col2 FROM 'large.csv'
```

Use filtering to reduce data:

SELECT * FROM 'large.csv' WHERE date >= '2024-01-01'

Encoding issues

Ensure files are UTF-8 encoded

Convert files to UTF-8 if needed:

iconv -f ISO-8859-1 -t UTF-8 input.csv > output.csv

Converting Other Formats

Excel to CSV

Export Excel files to CSV format before using with Evidence.

TSV (Tab-Separated Values)

connection.yaml

name: tsv
type: csv
options:
  options: "delim='\t'"

Pipe-Delimited Files

connection.yaml

name: pipe_delimited
type: csv
options:
  options: "delim='|'"

Migrating to Parquet

For better performance, convert CSV files to Parquet:

-- Export CSV to Parquet
COPY (SELECT * FROM 'data.csv') 
TO 'data.parquet' (FORMAT PARQUET);

Then use the DuckDB connector to query Parquet files directly:

SELECT * FROM 'data.parquet'

Get Started

Core Concepts

Data Sources

Components

Deployment

Guides

Installation

Configuration

Configuration Parameters

How It Works

File Placement

Example Queries

Query a CSV File

Join Multiple CSV Files

Filter and Transform

CSV Options

Common Options

Example with Custom Options

Advanced Features

Wildcard Patterns

Glob Patterns

Column Selection

Type Casting

CSV File Best Practices

File Format

Example Well-Formatted CSV

Optimize for Performance

Type Detection

Override Type Detection

Troubleshooting

Converting Other Formats

Excel to CSV

TSV (Tab-Separated Values)

Pipe-Delimited Files

Migrating to Parquet

Build docs developers (and LLMs) love

Get Started

Core Concepts

Data Sources

Components

Deployment

Guides

​Installation

​Configuration

​Configuration Parameters

​How It Works

​File Placement

​Example Queries

​Query a CSV File

​Join Multiple CSV Files

​Filter and Transform

​CSV Options

​Common Options

​Example with Custom Options

​Advanced Features

​Wildcard Patterns

​Glob Patterns

​Column Selection

​Type Casting

​CSV File Best Practices

​File Format

​Example Well-Formatted CSV

​Optimize for Performance

​Type Detection

​Override Type Detection

​Troubleshooting

​Converting Other Formats

​Excel to CSV

​TSV (Tab-Separated Values)

​Pipe-Delimited Files

​Migrating to Parquet

Build docs developers (and LLMs) love

Installation

Configuration

Configuration Parameters

How It Works

File Placement

Example Queries

Query a CSV File

Join Multiple CSV Files

Filter and Transform

CSV Options

Common Options

Example with Custom Options

Advanced Features

Wildcard Patterns

Glob Patterns

Column Selection

Type Casting

CSV File Best Practices

File Format

Example Well-Formatted CSV

Optimize for Performance

Type Detection

Override Type Detection

Troubleshooting

Converting Other Formats

Excel to CSV

TSV (Tab-Separated Values)

Pipe-Delimited Files

Migrating to Parquet