pg_scanner extension (querying Postgres from DuckDB) and pg_duckdb (querying MotherDuck from Postgres).
Overview
This integration enables:- Querying PostgreSQL data from DuckDB
- Querying MotherDuck data from PostgreSQL
- Running hybrid queries across both databases
- Replicating data between Postgres and MotherDuck using CTAS
For detailed configuration guidelines, see the MotherDuck documentation on loading data from Postgres.
Prerequisites
- Docker installed on your system
- DuckDB CLI (installation guide)
- A MotherDuck token
Local PostgreSQL Setup
Start the pg_duckdb container
Launch a PostgreSQL container with DuckDB integration and MotherDuck support:Parameters:
-d: Run in detached mode-p 5432:5432: Map PostgreSQL port-e POSTGRES_PASSWORD: Set the postgres user password-e MOTHERDUCK_TOKEN: Your MotherDuck authentication token-c duckdb.motherduck_enabled=true: Enable MotherDuck connectivity
Install DuckDB CLI
If you haven’t already, install the DuckDB CLI:macOS (Homebrew):Linux:See the DuckDB installation guide for other platforms.
Using pg_scanner: Query Postgres from DuckDB
Thepg_scanner extension allows DuckDB to query PostgreSQL databases directly.
Attach to PostgreSQL
Connect to your local PostgreSQL instance:This creates a database alias
pg that references your PostgreSQL instance.Verify the connection
List all tables across both DuckDB and PostgreSQL:You should see tables from both the local DuckDB instance and the attached PostgreSQL database.
Exercise: Replicate Data from Postgres to DuckDB
Use Create Table As Select (CTAS) to copy data from PostgreSQL into DuckDB:CTAS is efficient for one-time data replication. For incremental updates, consider using
INSERT INTO ... SELECT with appropriate filters.Example Queries with pg_scanner
Using pg_duckdb: Query MotherDuck from Postgres
Thepg_duckdb extension allows PostgreSQL to query MotherDuck databases.
Connect to PostgreSQL
From inside the container, connect to the database:This connects you to the PostgreSQL instance as the
postgres user.Exercise: Replicate Data from MotherDuck to Postgres
Use CTAS to copy data from MotherDuck into PostgreSQL:Exercise: Hybrid Query
Write a query that combines data from both PostgreSQL and MotherDuck:Hybrid queries allow you to leverage PostgreSQL’s transactional capabilities with MotherDuck’s analytical performance.
Architecture Patterns
Pattern 1: Postgres as Source, MotherDuck as Analytics
Pattern 2: MotherDuck as Source, Postgres as Serving Layer
Pattern 3: Bidirectional Sync
Performance Considerations
Query pushdown
Query pushdown
Both
pg_scanner and pg_duckdb support query pushdown, meaning filters and aggregations are executed on the source database when possible.Data transfer optimization
Data transfer optimization
Minimize data transfer by:
- Filtering early in the query
- Selecting only needed columns
- Aggregating on the source database
Connection pooling
Connection pooling
For production workloads, consider:
- Using connection pooling (pgBouncer)
- Limiting concurrent cross-database queries
- Monitoring network latency
Common Use Cases
Real-time analytics on operational data
Query live Postgres data with DuckDB’s analytical capabilities:
Troubleshooting
Connection refused to Postgres
Connection refused to Postgres
If DuckDB cannot connect to PostgreSQL:
- Verify the container is running:
docker ps - Check port mapping:
docker port <container_name> - Verify password in the ATTACH statement
- Ensure no firewall is blocking port 5432
MotherDuck token not found
MotherDuck token not found
If pg_duckdb cannot connect to MotherDuck:
- Verify the token was set when starting the container:
- Restart the container with the correct token
- Verify the token is valid by testing in DuckDB CLI:
Query performance issues
Query performance issues
If queries are slow:
- Check if filters are being pushed down (use EXPLAIN)
- Reduce data transfer with column selection and filtering
- Consider materializing frequently accessed data
- Monitor network latency between systems
Permission denied
Permission denied
If you get permission errors:
- Verify the Postgres user has necessary permissions
- Check schema access:
SELECT * FROM information_schema.tables - Ensure the database exists and is accessible
Best Practices
- Use CTAS for bulk transfers: More efficient than row-by-row inserts
- Filter early: Apply WHERE clauses before cross-database joins
- Monitor data transfer: Large queries can impact network and performance
- Secure credentials: Use environment variables, never hardcode tokens
- Test queries locally first: Validate logic before running on production data
- Document hybrid queries: Clearly indicate which system owns which data