Database Setup - Scribe Backend

Overview

Scribe Backend uses PostgreSQL as its primary database, typically hosted on Supabase. The application connects via Supabase’s transaction pooler (port 6543) for optimal performance and connection management.

Scribe uses SQLAlchemy 2.0 with Alembic for database migrations. All schema changes are version-controlled and reproducible.

Database Architecture

Connection Strategy

Scribe Backend → Supavisor (Transaction Pooler) → PostgreSQL
     ↓               ↓ Port 6543                      ↓
  NullPool    Server-side pooling              Database

Key Design Decisions:

Transaction Pooler (Port 6543): Supabase’s Supavisor handles connection pooling server-side
NullPool: No client-side pooling — fresh connections per request, immediately closed
SSL Required: All connections enforce sslmode=require

Do not use QueuePool or connection pooling on the client side with transaction pooler. This causes stale connection issues.

Step 1: Set Up Supabase Project

Create Supabase project

Go to supabase.com
Click New Project
Choose a region close to your users
Set a strong database password

Get connection details

Navigate to Settings → Database in your Supabase dashboard.Copy the Transaction Pooler connection string:

Host: aws-0-us-west-1.pooler.supabase.com
Port: 6543
Database: postgres
User: postgres.<project-ref>

Configure environment variables

Add these to your .env file:

DB_USER=postgres.<project-ref>
DB_PASSWORD=your-database-password
DB_HOST=aws-0-us-west-1.pooler.supabase.com
DB_PORT=6543
DB_NAME=postgres

Step 2: Verify Database Connection

Test the connection before running migrations:

# Activate virtual environment
source venv/bin/activate

# Test connection using Python
python -c "from database import check_db_connection; print('Connected!' if check_db_connection() else 'Failed')"

Alternative: Test with psql

psql "postgresql://postgres.<project-ref>:[email protected]:6543/postgres?sslmode=require"

If connection fails, verify your IP is allowed in Supabase’s database settings or disable IP restrictions for development.

Step 3: Run Database Migrations

Alembic manages database schema changes through versioned migration files.

Apply Migrations

# Run all pending migrations
alembic upgrade head

Expected output:

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> abc123, create users table
INFO  [alembic.runtime.migration] Running upgrade abc123 -> def456, create emails table
INFO  [alembic.runtime.migration] Running upgrade def456 -> ghi789, create queue_items table

Check Current Migration Version

alembic current

Output:

ghi789 (head)

View Migration History

alembic history --verbose

Database Schema

Users Table

CREATE TABLE users (
    id UUID PRIMARY KEY,                 -- From Supabase auth.users
    email VARCHAR(255) UNIQUE NOT NULL,
    display_name VARCHAR(255),
    generation_count INTEGER DEFAULT 0,
    template_count INTEGER DEFAULT 0,
    onboarded BOOLEAN DEFAULT FALSE,
    email_template TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX ix_users_email ON users(email);

Purpose: Stores user profiles synced with Supabase Auth.

Emails Table

CREATE TABLE emails (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    recipient_name VARCHAR(255) NOT NULL,
    recipient_interest VARCHAR(500) NOT NULL,
    email_message TEXT NOT NULL,
    template_type VARCHAR(50),           -- RESEARCH | BOOK | GENERAL
    metadata JSONB,                      -- Pipeline metadata
    is_confident BOOLEAN DEFAULT TRUE,
    displayed BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX ix_emails_user_id ON emails(user_id);
CREATE INDEX ix_emails_created_at ON emails(created_at);
CREATE INDEX ix_emails_user_created ON emails(user_id, created_at DESC);

JSONB Metadata Structure:

{
  "search_terms": ["Dr. Jane Smith machine learning"],
  "scraped_urls": ["https://example.com/profile"],
  "arxiv_papers": [
    {
      "title": "Deep Learning for Computer Vision",
      "arxiv_url": "https://arxiv.org/abs/...",
      "year": 2023
    }
  ],
  "step_timings": {
    "template_parser": 1.2,
    "web_scraper": 5.3,
    "arxiv_helper": 0.8,
    "email_composer": 3.1
  },
  "model": "anthropic:claude-sonnet-4-5",
  "temperature": 0.7
}

Queue Items Table

CREATE TABLE queue_items (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    recipient_name VARCHAR(255) NOT NULL,
    recipient_interest VARCHAR(500) NOT NULL,
    email_template_text TEXT NOT NULL,
    status VARCHAR(50) NOT NULL,         -- PENDING | PROCESSING | COMPLETED | FAILED
    celery_task_id VARCHAR(255),
    current_step VARCHAR(100),
    email_id UUID REFERENCES emails(id) ON DELETE SET NULL,
    error_message TEXT,
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX ix_queue_items_user_id ON queue_items(user_id);
CREATE INDEX ix_queue_items_status ON queue_items(status);
CREATE INDEX ix_queue_items_created_at ON queue_items(created_at);
CREATE INDEX ix_queue_items_user_status ON queue_items(user_id, status);

Purpose: Tracks batch email generation requests with FIFO processing.

Connection Pooling Explained

Why NullPool?

Scribe uses SQLAlchemy’s NullPool strategy:

from sqlalchemy import create_engine
from sqlalchemy.pool import NullPool

engine = create_engine(
    settings.database_url,
    poolclass=NullPool,  # No client-side pooling
    connect_args={
        "connect_timeout": settings.db_connect_timeout,
        "options": f"-c statement_timeout={settings.db_statement_timeout}"
    }
)

Benefits:

No Stale Connections: Fresh connection per request, immediately discarded
Optimal for Transaction Pooler: Supavisor handles pooling server-side
Memory Efficient: No idle connections consuming resources
Works on Raspberry Pi: Minimal memory footprint

Transaction pooler (port 6543) is designed for stateless, short-lived connections. NullPool is the perfect match.

Connection Timeouts

Scribe configures two types of timeouts:

Connect Timeout (30 seconds)

DB_CONNECT_TIMEOUT=30  # Handles cold starts and network latency

Statement Timeout (30 seconds)

DB_STATEMENT_TIMEOUT=30000  # Query timeout in milliseconds

Long-running queries beyond 30 seconds will be terminated. Optimize slow queries or increase the timeout.

Creating Migrations

When you modify database models, create a new migration:

# Auto-generate migration from model changes
alembic revision --autogenerate -m "Add new column to users table"

Alembic compares your models (models/*.py) with the current database schema and generates a migration file in alembic/versions/.

Review Generated Migration

# Open the latest migration file
ls -lt alembic/versions/ | head -2
cat alembic/versions/abc123_add_new_column_to_users_table.py

Example migration:

def upgrade() -> None:
    op.add_column('users', sa.Column('new_field', sa.String(255), nullable=True))

def downgrade() -> None:
    op.drop_column('users', 'new_field')

Apply Migration

alembic upgrade head

Rollback Migration

# Rollback one migration
alembic downgrade -1

# Rollback to specific version
alembic downgrade abc123

Database Utilities

Health Check

The /health endpoint checks database connectivity:

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "service": "scribe-api",
  "version": "1.0.0",
  "database": "connected",
  "environment": "development"
}

Get Database Info

from database import get_db_info

db_info = get_db_info()
print(db_info)
# Output: {"status": "connected", "url": "postgresql+psycopg2://[email protected]:6543/postgres"}

Production Considerations

SSL/TLS Enforcement

All Supabase connections require SSL:

database_url = f"postgresql+psycopg2://...?sslmode=require"

Never set sslmode=disable in production. This exposes sensitive data over unencrypted connections.

Connection Pooler Choice

Pooler Type	Port	Use Case
Transaction	6543	Stateless apps, serverless, auto-scaling (Scribe)
Session	5432	Long-running transactions, advisory locks

Scribe uses transaction pooler (6543) for optimal performance with its stateless pipeline design.

Common Database Operations

Backup Database

Supabase provides automatic backups, but you can also manually export:

pg_dump "postgresql://postgres.<project-ref>:[email protected]:6543/postgres?sslmode=require" > backup.sql

Restore Database

psql "postgresql://..." < backup.sql

Reset Database (Development Only)

Destructive operation: This deletes all data.

# Rollback all migrations
alembic downgrade base

# Reapply migrations
alembic upgrade head

Troubleshooting

Connection timeout errors

Error: OperationalError: could not connect to serverSolutions:

Check database host and port in .env
Verify IP whitelist in Supabase dashboard
Increase DB_CONNECT_TIMEOUT for slow networks:
```
DB_CONNECT_TIMEOUT=60
```

SSL/TLS errors

Error: SSL connection has been closed unexpectedlySolution: Ensure sslmode=require is in the connection string. Scribe handles this automatically via settings.database_url.

Migration conflicts

Error: Target database is not up to dateSolution:

# Check current version
alembic current

# View pending migrations
alembic history

# Apply missing migrations
alembic upgrade head

Port 6543 vs 5432 confusion

Issue: Using wrong port for connection typeSolution:

Transaction pooler: Use port 6543 (recommended)
Session pooler: Use port 5432
Direct connection: Use port 5432 (not recommended)

Scribe defaults to transaction pooler (6543).

Stale connection errors

Error: server closed the connection unexpectedlyCause: Using QueuePool with transaction poolerSolution: Verify poolclass=NullPool in database/base.py. This is the default configuration.

Next Steps

Run the application locally → Running Locally Guide
Deploy to production → Deployment Guide

Get Started

Core Concepts

Guides

Development

​Overview

​Database Architecture

​Connection Strategy

​Step 1: Set Up Supabase Project

​Step 2: Verify Database Connection

​Alternative: Test with psql

​Step 3: Run Database Migrations

​Apply Migrations

​Check Current Migration Version

​View Migration History

​Database Schema

​Users Table

​Emails Table

​Queue Items Table

​Connection Pooling Explained

​Why NullPool?

​Connection Timeouts

​Connect Timeout (30 seconds)

​Statement Timeout (30 seconds)

​Creating Migrations

​Review Generated Migration

​Apply Migration

​Rollback Migration

​Database Utilities

​Health Check

​Get Database Info

​Production Considerations

​SSL/TLS Enforcement

​Connection Pooler Choice

​Common Database Operations

​Backup Database

​Restore Database

​Reset Database (Development Only)

​Troubleshooting

​Next Steps

​Additional Resources

Build docs developers (and LLMs) love

Overview

Database Architecture

Connection Strategy

Step 1: Set Up Supabase Project

Step 2: Verify Database Connection

Alternative: Test with psql

Step 3: Run Database Migrations

Apply Migrations

Check Current Migration Version

View Migration History

Database Schema

Users Table

Emails Table

Queue Items Table

Connection Pooling Explained

Why NullPool?

Connection Timeouts

Connect Timeout (30 seconds)

Statement Timeout (30 seconds)

Creating Migrations

Review Generated Migration

Apply Migration

Rollback Migration

Database Utilities

Health Check

Get Database Info

Production Considerations

SSL/TLS Enforcement

Connection Pooler Choice

Common Database Operations

Backup Database

Restore Database

Reset Database (Development Only)

Troubleshooting

Next Steps

Additional Resources