Installation
Install the CSV connector:Configuration
connection.yaml
The CSV connector uses DuckDB’s
read_csv function under the hood, which automatically detects file structure and types.Configuration Parameters
DuckDB CSV options string (see DuckDB CSV documentation)
How It Works
The CSV connector:- Looks for
.csvfiles in your sources directory - Automatically loads them using DuckDB’s CSV reader
- Applies auto-detection for delimiters, headers, and data types
- Makes the data queryable via SQL
File Placement
Place CSV files in your sources directory:Example Queries
Query a CSV File
Create a.sql file that references your CSV:
queries/sales_summary.sql
Join Multiple CSV Files
queries/customer_orders.sql
Filter and Transform
queries/recent_orders.sql
CSV Options
Customize CSV parsing using theoptions parameter:
connection.yaml
Common Options
delim=','- Column delimiter (default: auto-detect)header=true- First row contains headers (default: auto-detect)quote='"'- Quote character (default:")escape='"'- Escape character (default:")skip=0- Number of lines to skip at the beginningdateformat='%Y-%m-%d'- Date format stringtimestampformat='%Y-%m-%d %H:%M:%S'- Timestamp formatnullstr='NULL'- String representing NULL values
Example with Custom Options
connection.yaml
Advanced Features
Wildcard Patterns
Query multiple files at once:queries/all_regions.sql
Glob Patterns
Use glob patterns for flexible file matching:queries/yearly_data.sql
Column Selection
Select specific columns:queries/subset.sql
Type Casting
Explicitly cast column types:queries/with_types.sql
CSV File Best Practices
File Format
- Use UTF-8 encoding
- Include headers in the first row
- Use consistent date formats (ISO 8601 recommended: YYYY-MM-DD)
- Quote fields containing delimiters or newlines
Example Well-Formatted CSV
sales_data.csv
Optimize for Performance
- Keep files under 100MB for fastest loading
- For larger datasets, split into multiple files
- Use consistent column types across files
- Remove unnecessary columns
Type Detection
DuckDB automatically detects column types:- Numbers: Integers and decimals
- Dates: ISO 8601 formats (YYYY-MM-DD)
- Timestamps: ISO 8601 with time (YYYY-MM-DD HH:MM:SS)
- Strings: Any non-numeric data
- Booleans: true/false, t/f, yes/no, 1/0
Override Type Detection
If auto-detection fails, cast explicitly in queries:Troubleshooting
File not found errors
File not found errors
- Ensure CSV files are in your sources directory
- Check file names match exactly (case-sensitive)
- Use relative paths from the sources directory
Type detection errors
Type detection errors
- Check for inconsistent data types in columns
- Look for non-numeric values in numeric columns
- Use explicit type casting in queries:
Delimiter detection issues
Delimiter detection issues
- Specify the delimiter explicitly:
- Common delimiters:
,(comma),;(semicolon),\t(tab),|(pipe)
Date parsing errors
Date parsing errors
- Use ISO 8601 format (YYYY-MM-DD) when possible
- Specify date format if using different format:
Memory errors with large files
Memory errors with large files
- Split large files into smaller chunks
- Select only needed columns:
- Use filtering to reduce data:
Encoding issues
Encoding issues
- Ensure files are UTF-8 encoded
- Convert files to UTF-8 if needed:
Converting Other Formats
Excel to CSV
Export Excel files to CSV format before using with Evidence.TSV (Tab-Separated Values)
connection.yaml
Pipe-Delimited Files
connection.yaml