Skip to main content
This guide demonstrates how to connect PostgreSQL with MotherDuck, enabling hybrid queries that combine data from both systems. You’ll learn to use both the pg_scanner extension (querying Postgres from DuckDB) and pg_duckdb (querying MotherDuck from Postgres).

Overview

This integration enables:
  • Querying PostgreSQL data from DuckDB
  • Querying MotherDuck data from PostgreSQL
  • Running hybrid queries across both databases
  • Replicating data between Postgres and MotherDuck using CTAS
For detailed configuration guidelines, see the MotherDuck documentation on loading data from Postgres.

Prerequisites

Local PostgreSQL Setup

1

Start the pg_duckdb container

Launch a PostgreSQL container with DuckDB integration and MotherDuck support:
docker run -d -p 5432:5432 \
  -e POSTGRES_PASSWORD="very_secur3_pw" \
  -e MOTHERDUCK_TOKEN="your_token" \
  pgduckdb/pgduckdb:17-main \
  -c duckdb.motherduck_enabled=true
Parameters:
  • -d: Run in detached mode
  • -p 5432:5432: Map PostgreSQL port
  • -e POSTGRES_PASSWORD: Set the postgres user password
  • -e MOTHERDUCK_TOKEN: Your MotherDuck authentication token
  • -c duckdb.motherduck_enabled=true: Enable MotherDuck connectivity
Replace your_token with your actual MotherDuck token. Never commit tokens to version control.
2

Install DuckDB CLI

If you haven’t already, install the DuckDB CLI:macOS (Homebrew):
brew install duckdb
Linux:
# Ubuntu/Debian
wget https://github.com/duckdb/duckdb/releases/download/v1.0.0/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
chmod +x duckdb
See the DuckDB installation guide for other platforms.
3

Verify the setup

Check that the PostgreSQL container is running:
docker ps
You should see the pgduckdb/pgduckdb:17-main container running on port 5432.

Using pg_scanner: Query Postgres from DuckDB

The pg_scanner extension allows DuckDB to query PostgreSQL databases directly.
1

Start DuckDB CLI

Launch the DuckDB command-line interface:
duckdb
Or if you downloaded the binary:
./duckdb
2

Install and load pg_scanner

Install the PostgreSQL scanner extension:
INSTALL postgres;
LOAD postgres;
These commands only need to be run once per DuckDB instance. The extension persists after installation.
3

Attach to PostgreSQL

Connect to your local PostgreSQL instance:
ATTACH 'dbname=postgres user=postgres password=very_secur3_pw host=127.0.0.1' 
  AS pg (TYPE POSTGRES);
This creates a database alias pg that references your PostgreSQL instance.
4

Verify the connection

List all tables across both DuckDB and PostgreSQL:
SHOW ALL TABLES;
You should see tables from both the local DuckDB instance and the attached PostgreSQL database.
5

Query PostgreSQL data

Query tables in PostgreSQL using the pg prefix:
SELECT * FROM pg.public.winelist;
This query runs against PostgreSQL but returns results in DuckDB.

Exercise: Replicate Data from Postgres to DuckDB

Use Create Table As Select (CTAS) to copy data from PostgreSQL into DuckDB:
-- Create a local DuckDB table from Postgres data
CREATE TABLE local_winelist AS
SELECT * FROM pg.public.winelist;

-- Verify the copy
SELECT COUNT(*) FROM local_winelist;
CTAS is efficient for one-time data replication. For incremental updates, consider using INSERT INTO ... SELECT with appropriate filters.

Example Queries with pg_scanner

-- Join Postgres data with local DuckDB data
SELECT 
  l.vintage,
  l.wine_name,
  p.price_history
FROM local_winelist l
JOIN pg.public.price_data p
  ON l.wine_id = p.wine_id
WHERE l.vintage >= 2000

Using pg_duckdb: Query MotherDuck from Postgres

The pg_duckdb extension allows PostgreSQL to query MotherDuck databases.
1

Connect to the container

Get your container name:
docker ps
Connect to the PostgreSQL container:
docker exec -it <container_name> /bin/bash
The container name is shown in the NAMES column of docker ps output.
2

Connect to PostgreSQL

From inside the container, connect to the database:
psql
This connects you to the PostgreSQL instance as the postgres user.
3

Query local PostgreSQL data

First, verify you can query local Postgres tables:
SELECT * FROM public.winelist;
4

Query MotherDuck data

Because you set MOTHERDUCK_TOKEN when starting the container, you can now query MotherDuck directly:
-- Query a MotherDuck table
SELECT * 
FROM duckdb.sample_data.nyc.service_requests
LIMIT 10;
The duckdb. prefix indicates this query runs through DuckDB/MotherDuck.

Exercise: Replicate Data from MotherDuck to Postgres

Use CTAS to copy data from MotherDuck into PostgreSQL:
-- Create a Postgres table from MotherDuck data
CREATE TABLE nyc_requests AS
SELECT * 
FROM duckdb.sample_data.nyc.service_requests
LIMIT 1000;

-- Verify the data
SELECT COUNT(*) FROM nyc_requests;

Exercise: Hybrid Query

Write a query that combines data from both PostgreSQL and MotherDuck:
-- Join Postgres data with MotherDuck data
SELECT 
  p.wine_name,
  p.vintage,
  p.price_per_bottle,
  md.market_data
FROM public.winelist p
JOIN duckdb.wine_analysis.market_trends md
  ON p.wine_id = md.wine_id
WHERE p.vintage >= 2015
ORDER BY p.price_per_bottle DESC
LIMIT 20;
Hybrid queries allow you to leverage PostgreSQL’s transactional capabilities with MotherDuck’s analytical performance.

Architecture Patterns

Pattern 1: Postgres as Source, MotherDuck as Analytics

-- In DuckDB with pg_scanner
-- Extract transactional data from Postgres
CREATE TABLE analytics.daily_orders AS
SELECT 
  DATE_TRUNC('day', order_date) as date,
  customer_id,
  SUM(total_amount) as daily_total,
  COUNT(*) as order_count
FROM pg.public.orders
WHERE order_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY date, customer_id;

-- Upload to MotherDuck for analytics
COPY analytics.daily_orders TO 'md:analytics';

Pattern 2: MotherDuck as Source, Postgres as Serving Layer

-- In Postgres with pg_duckdb
-- Materialize MotherDuck analytics in Postgres
CREATE MATERIALIZED VIEW customer_metrics AS
SELECT 
  customer_id,
  total_orders,
  lifetime_value,
  last_order_date
FROM duckdb.analytics.customer_summary;

-- Refresh periodically
REFRESH MATERIALIZED VIEW customer_metrics;

Pattern 3: Bidirectional Sync

-- Sync operational data to analytics warehouse
-- From Postgres to MotherDuck (using DuckDB CLI)
CREATE OR REPLACE TABLE md:warehouse.orders AS
SELECT * FROM pg.public.orders;

-- Sync enriched data back to Postgres
-- From pg_duckdb in Postgres
CREATE TABLE enriched_orders AS
SELECT * FROM duckdb.analytics.enriched_orders;

Performance Considerations

Both pg_scanner and pg_duckdb support query pushdown, meaning filters and aggregations are executed on the source database when possible.
-- This filter is pushed down to Postgres
SELECT * FROM pg.public.orders
WHERE order_date >= '2024-01-01';
Minimize data transfer by:
  • Filtering early in the query
  • Selecting only needed columns
  • Aggregating on the source database
-- Good: Filters and aggregates before transfer
SELECT category, COUNT(*), AVG(price)
FROM pg.public.products
WHERE active = true
GROUP BY category;

-- Bad: Transfers all data then filters
SELECT * FROM (
  SELECT * FROM pg.public.products
) WHERE active = true;
For production workloads, consider:
  • Using connection pooling (pgBouncer)
  • Limiting concurrent cross-database queries
  • Monitoring network latency

Common Use Cases

1

Real-time analytics on operational data

Query live Postgres data with DuckDB’s analytical capabilities:
SELECT 
  DATE_TRUNC('hour', created_at) as hour,
  COUNT(*) as events_per_hour,
  AVG(processing_time) as avg_time
FROM pg.public.events
WHERE created_at >= NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
2

Data warehouse offloading

Move historical data to MotherDuck while keeping recent data in Postgres:
-- Archive old orders to MotherDuck
COPY (
  SELECT * FROM pg.public.orders
  WHERE order_date < CURRENT_DATE - INTERVAL '1 year'
) TO 'md:archive.historical_orders';
3

ETL/ELT pipelines

Build data pipelines that combine both systems:
-- Extract from Postgres, transform in DuckDB, load to MotherDuck
CREATE TABLE md:clean.orders AS
SELECT 
  order_id,
  customer_id,
  CAST(order_date AS DATE) as order_date,
  CAST(total_amount AS DECIMAL(10,2)) as total_amount
FROM pg.public.raw_orders
WHERE order_date IS NOT NULL
  AND total_amount > 0;
4

Application serving layer

Use Postgres for transactional queries and MotherDuck for analytics:
-- Transactional: Get customer details (fast, from Postgres)
SELECT * FROM customers WHERE customer_id = 12345;

-- Analytical: Get customer behavior (from MotherDuck)
SELECT * FROM duckdb.analytics.customer_behavior 
WHERE customer_id = 12345;

Troubleshooting

If DuckDB cannot connect to PostgreSQL:
  1. Verify the container is running: docker ps
  2. Check port mapping: docker port <container_name>
  3. Verify password in the ATTACH statement
  4. Ensure no firewall is blocking port 5432
If pg_duckdb cannot connect to MotherDuck:
  1. Verify the token was set when starting the container:
    docker inspect <container_name> | grep MOTHERDUCK_TOKEN
    
  2. Restart the container with the correct token
  3. Verify the token is valid by testing in DuckDB CLI:
    SELECT * FROM 'md:' LIMIT 1;
    
If queries are slow:
  1. Check if filters are being pushed down (use EXPLAIN)
  2. Reduce data transfer with column selection and filtering
  3. Consider materializing frequently accessed data
  4. Monitor network latency between systems
If you get permission errors:
  1. Verify the Postgres user has necessary permissions
  2. Check schema access: SELECT * FROM information_schema.tables
  3. Ensure the database exists and is accessible

Best Practices

  • Use CTAS for bulk transfers: More efficient than row-by-row inserts
  • Filter early: Apply WHERE clauses before cross-database joins
  • Monitor data transfer: Large queries can impact network and performance
  • Secure credentials: Use environment variables, never hardcode tokens
  • Test queries locally first: Validate logic before running on production data
  • Document hybrid queries: Clearly indicate which system owns which data
Cross-database queries involve network latency. For production workloads, consider data replication strategies to minimize real-time cross-database queries.

Build docs developers (and LLMs) love