Skip to main content
marimo provides built-in SQL support, allowing you to query databases and work with results as DataFrames.

Quick Start

import marimo as mo

# Execute SQL (uses DuckDB by default)
result = mo.sql("""
    SELECT * FROM 'data.csv'
    WHERE value > 100
    LIMIT 10
""")
The result is automatically displayed as an interactive table and returned as a DataFrame.

Default Engine: DuckDB

By default, mo.sql() uses DuckDB, a fast in-process SQL database:
import marimo as mo
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "city": ["NYC", "SF", "LA"]
})

# Query the DataFrame directly
result = mo.sql("""
    SELECT name, age
    FROM df
    WHERE age >= 30
    ORDER BY age DESC
""")
DuckDB can directly query DataFrames in your Python namespace - just reference them by variable name!

Supported Databases

marimo supports multiple database engines:

DuckDB

Default engine
In-process OLAP database
import duckdb
conn = duckdb.connect()

SQLAlchemy

Universal SQL toolkit
PostgreSQL, MySQL, SQLite, etc.
from sqlalchemy import create_engine
engine = create_engine("postgresql://...")

Ibis

Python DataFrames
Unified API for 20+ backends
import ibis
conn = ibis.connect("postgres://...")

Clickhouse

OLAP database
High-performance analytics
import clickhouse_connect
client = clickhouse_connect.get_client()

Redshift

AWS data warehouse
Cloud data warehousing
import redshift_connector
conn = redshift_connector.connect()

DB-API 2.0

Standard interface
Any PEP 249 connection
import psycopg2
conn = psycopg2.connect()

Custom Database Connections

SQLAlchemy

import marimo as mo
from sqlalchemy import create_engine

# Connect to PostgreSQL
engine = create_engine("postgresql://user:password@localhost/dbname")

# Execute query
result = mo.sql(
    "SELECT * FROM users WHERE created_at > '2024-01-01'",
    engine=engine
)

Ibis

import marimo as mo
import ibis

# Connect to any Ibis-supported backend
conn = ibis.connect("duckdb://my_database.db")

result = mo.sql(
    "SELECT product, SUM(sales) as total FROM orders GROUP BY product",
    engine=conn
)

DuckDB with Persistence

import marimo as mo
import duckdb

# Persistent database file
conn = duckdb.connect("my_data.duckdb")

# Create table
mo.sql("""
    CREATE TABLE IF NOT EXISTS products (
        id INTEGER,
        name VARCHAR,
        price DECIMAL(10,2)
    )
""", engine=conn)

# Insert data
mo.sql("""
    INSERT INTO products VALUES
    (1, 'Widget', 19.99),
    (2, 'Gadget', 29.99)
""", engine=conn, output=False)  # Don't display output

Clickhouse

import marimo as mo
import clickhouse_connect

client = clickhouse_connect.get_client(
    host="localhost",
    port=8123,
    username="default",
    password=""
)

result = mo.sql(
    "SELECT * FROM system.tables LIMIT 10",
    engine=client
)

Query Features

Referencing DataFrames

DuckDB can query any DataFrame in scope:
import marimo as mo
import pandas as pd

sales = pd.read_csv("sales.csv")
products = pd.read_csv("products.csv")

# Join DataFrames with SQL
result = mo.sql("""
    SELECT 
        p.name,
        SUM(s.amount) as total_sales
    FROM sales s
    JOIN products p ON s.product_id = p.id
    GROUP BY p.name
    ORDER BY total_sales DESC
""")

Reading Files

DuckDB can read various file formats directly:
import marimo as mo

# CSV files
result = mo.sql("""
    SELECT * FROM 'data.csv'
    WHERE column > 100
""")

# Parquet files  
result = mo.sql("""
    SELECT * FROM 'data.parquet'
""")

# JSON files
result = mo.sql("""
    SELECT * FROM 'data.json'
""")

# Multiple files (glob patterns)
result = mo.sql("""
    SELECT * FROM 'sales_*.csv'
""")

Remote Files

import marimo as mo

# Query S3 (requires httpfs extension)
result = mo.sql("""
    SELECT * FROM 's3://bucket/data.parquet'
""")

# Query HTTP(S) URLs
result = mo.sql("""
    SELECT * FROM 'https://example.com/data.csv'
""")

Output Control

Disable Output Display

# Execute without displaying results
mo.sql(
    "INSERT INTO table VALUES (1, 2, 3)",
    output=False
)

Output Format

Control the returned DataFrame type via config:
# Returns polars.DataFrame
result = mo.sql("SELECT * FROM data")
type(result)  # polars.DataFrame
The output format is configured in marimo settings under SQLOutput type. Options: auto, polars, lazy-polars, pandas

Result Limits

Limit large query results automatically:
import os

# Set default limit (optional)
os.environ["MARIMO_SQL_DEFAULT_LIMIT"] = "10000"

import marimo as mo

# Query returns max 10,000 rows
result = mo.sql("SELECT * FROM large_table")
If a query result exceeds the limit, marimo displays a preview and notes the data is truncated.

EXPLAIN Queries

Analyze query execution plans:
import marimo as mo

# EXPLAIN shows the query plan
mo.sql("""
    EXPLAIN SELECT * FROM large_table WHERE id > 1000
""")
For DuckDB, EXPLAIN output preserves box-drawing characters for readable query plans.

Advanced Features

Lazy Evaluation

For large datasets, use lazy evaluation:
import marimo as mo
import polars as pl

# Returns polars.LazyFrame - no data loaded yet
result = mo.sql("SELECT * FROM 'huge_file.parquet'")

type(result)  # polars.LazyFrame

# Data is only loaded when needed
filtered = result.filter(pl.col("value") > 100).collect()

DuckDB Extensions

Load DuckDB extensions for additional functionality:
import duckdb
import marimo as mo

conn = duckdb.connect()

# Install and load extensions
conn.execute("INSTALL httpfs")
conn.execute("LOAD httpfs")

conn.execute("INSTALL spatial")
conn.execute("LOAD spatial")

# Now you can query S3 and use spatial functions
mo.sql("""
    SELECT ST_AsText(geometry) 
    FROM 's3://bucket/geo_data.parquet'
""", engine=conn)

Common Table Expressions (CTEs)

import marimo as mo

result = mo.sql("""
    WITH high_value_customers AS (
        SELECT customer_id, SUM(amount) as total
        FROM orders
        GROUP BY customer_id
        HAVING total > 1000
    )
    SELECT c.name, hvc.total
    FROM high_value_customers hvc
    JOIN customers c ON hvc.customer_id = c.id
    ORDER BY hvc.total DESC
""")

Window Functions

import marimo as mo

result = mo.sql("""
    SELECT 
        date,
        revenue,
        SUM(revenue) OVER (
            ORDER BY date
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) as rolling_7day
    FROM daily_sales
    ORDER BY date
""")

Database Catalog

marimo can discover database schemas, tables, and columns automatically for autocomplete and exploration.

Available Databases

View connected databases in the datasources panel:
  • Automatically detects database connections in scope
  • Shows schemas and tables
  • Provides column information

Autocomplete

When typing SQL queries, marimo provides intelligent autocomplete for:
  • Table names
  • Column names
  • SQL keywords
  • Functions

Best Practices

1

Use parameterization

Avoid SQL injection by using f-strings carefully, or use parameterized queries with SQLAlchemy.
2

Limit large results

Set MARIMO_SQL_DEFAULT_LIMIT or use LIMIT clauses to prevent loading huge datasets.
3

Index your data

For repeated queries on the same data, use DuckDB persistent databases with indexes.
4

Choose the right engine

  • DuckDB: Fast analytics on local/cloud files
  • SQLAlchemy: OLTP databases (PostgreSQL, MySQL)
  • Ibis: Unified API across multiple backends

Example: Complete Workflow

import marimo as mo
import pandas as pd

# 1. Load data
sales = pd.read_csv("sales.csv")
customers = pd.read_csv("customers.csv")

# 2. Transform with SQL
summary = mo.sql("""
    SELECT 
        c.segment,
        COUNT(DISTINCT s.customer_id) as customer_count,
        SUM(s.amount) as total_revenue,
        AVG(s.amount) as avg_order_value
    FROM sales s
    JOIN customers c ON s.customer_id = c.id
    WHERE s.date >= '2024-01-01'
    GROUP BY c.segment
    ORDER BY total_revenue DESC
""")

# 3. Visualize results
import altair as alt

chart = (
    alt.Chart(summary)
    .mark_bar()
    .encode(
        x="segment",
        y="total_revenue",
        color="segment"
    )
)

mo.ui.altair_chart(chart)
This workflow demonstrates loading data, transforming it with SQL, and visualizing the results - all with reactive updates.

Build docs developers (and LLMs) love