Skip to main content

Supabase Database Setup

Supabase provides a managed PostgreSQL database for storing crawl site metadata and scheduling information.

Create Supabase Project

1

Navigate to Supabase Dashboard

Go to app.supabase.com and sign in.
2

Create New Project

  1. Click “New Project”
  2. Select your organization (or create one)
  3. Configure project settings:
    • Name: llmstxt-generator
    • Database Password: Generate strong password (save securely)
    • Region: Choose closest to your AWS region
    • Pricing Plan: Free tier (sufficient for most use cases)
  4. Click “Create new project”
3

Wait for Provisioning

Project creation takes 1-2 minutes. Wait for status to show “Active”.

Retrieve Supabase Credentials

Once your project is active:
1

Navigate to Project Settings

Click the gear icon (⚙️) in the sidebar → API
2

Copy Project URL

Under “Project URL”:
https://abcdefghijklmnop.supabase.co
This is your SUPABASE_URL - save it for Terraform configuration.
3

Copy Anon/Public Key

Under “Project API keys” → anon public:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
This is your SUPABASE_KEY - save it for Terraform configuration.
The anon key is safe to use in client applications and has row-level security. Never use the service_role key in client-side code.

Run Database Migrations

Create the crawl_sites table for tracking crawled websites.
1

Open SQL Editor

In Supabase dashboard, navigate to SQL Editor (left sidebar).
2

Create New Query

Click “New query” button.
3

Paste Migration SQL

Copy and paste the following SQL:
-- Create crawl_sites table for automated recrawling
CREATE TABLE IF NOT EXISTS crawl_sites (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  base_url TEXT UNIQUE NOT NULL,
  recrawl_interval_minutes INTEGER NOT NULL,
  max_pages INTEGER NOT NULL DEFAULT 50,
  desc_length INTEGER NOT NULL DEFAULT 500,
  last_crawled_at TIMESTAMPTZ,
  latest_llms_hash TEXT,
  latest_llms_url TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create index for efficient querying of sites due for recrawl
CREATE INDEX IF NOT EXISTS idx_crawl_sites_due
  ON crawl_sites(last_crawled_at, recrawl_interval_minutes);

-- Add comments for documentation
COMMENT ON TABLE crawl_sites IS 'Stores metadata for sites enrolled in automated llms.txt updates';
COMMENT ON COLUMN crawl_sites.base_url IS 'The base URL of the crawled site (unique identifier)';
COMMENT ON COLUMN crawl_sites.recrawl_interval_minutes IS 'How often to recrawl this site (in minutes)';
COMMENT ON COLUMN crawl_sites.max_pages IS 'Maximum number of pages to crawl per scan';
COMMENT ON COLUMN crawl_sites.desc_length IS 'Maximum length of page descriptions/snippets';
COMMENT ON COLUMN crawl_sites.last_crawled_at IS 'Timestamp of the last successful crawl';
COMMENT ON COLUMN crawl_sites.latest_llms_hash IS 'SHA-256 hash of the latest generated llms.txt content';
COMMENT ON COLUMN crawl_sites.latest_llms_url IS 'URL where the latest llms.txt is hosted';
4

Run the Query

Click “Run” button or press Ctrl+Enter (Windows/Linux) / Cmd+Enter (macOS).
5

Verify Table Creation

Navigate to Table Editor in the sidebar. You should see the crawl_sites table with 10 columns.
Successfully created crawl_sites table!

Database Schema Overview

The crawl_sites table structure:
ColumnTypeDescription
idUUIDPrimary key (auto-generated)
base_urlTEXTWebsite URL (unique)
recrawl_interval_minutesINTEGERRecrawl frequency in minutes
max_pagesINTEGERMaximum pages to crawl (default: 50)
desc_lengthINTEGERMax description length (default: 500)
last_crawled_atTIMESTAMPTZLast crawl timestamp
latest_llms_hashTEXTSHA-256 hash of llms.txt content
latest_llms_urlTEXTPublic URL of generated llms.txt
created_atTIMESTAMPTZRecord creation time
updated_atTIMESTAMPTZLast update time
The idx_crawl_sites_due index optimizes queries for finding sites that need recrawling based on last_crawled_at and recrawl_interval_minutes.

Cloudflare R2 Storage Setup

Cloudflare R2 provides S3-compatible object storage for hosting generated llms.txt files.

Create R2 Bucket

1

Navigate to R2 Dashboard

Log in to dash.cloudflare.com → Select R2 from sidebar.
2

Purchase R2 (if needed)

If first time using R2:
  1. Click “Purchase R2”
  2. Review pricing (free tier: 10GB storage, 1M Class A ops/month)
  3. Click “Get Started”
3

Create Bucket

  1. Click “Create bucket”
  2. Bucket name: llmstxt (or your preferred name)
  3. Location: Automatic (recommended)
  4. Click “Create bucket”
4

Enable Public Access

  1. Open your newly created bucket
  2. Go to Settings tab
  3. Under Public access, enable:
    • “Allow public access”
  4. Save changes
Public access is required so that generated llms.txt files can be accessed via HTTP URLs. Files are not listed publicly, only accessible if you know the URL.

Generate R2 API Token

1

Navigate to API Tokens

In R2 dashboard, click “Manage R2 API Tokens” (top right).
2

Create API Token

  1. Click “Create API token”
  2. Configure token:
    • Token name: llmstxt-backend
    • Permissions: Read & Write
    • Bucket: Select your bucket (llmstxt) or “Apply to all buckets”
    • TTL: Leave empty (no expiration)
  3. Click “Create API Token”
3

Save Credentials

Copy and save all three values (you won’t see them again):
# R2_ACCESS_KEY
abc123def456ghi789jkl
The Secret Access Key is shown only once. Store it securely in a password manager.

Get Public R2 Domain

1

Return to Bucket Settings

Navigate back to your bucket → Settings tab.
2

Copy Public R2.dev Domain

Under Public access, copy the R2.dev subdomain:
https://pub-1234567890abcdef.r2.dev
This is your R2_PUBLIC_DOMAIN - files will be accessible at: https://pub-xxxxx.r2.dev/your-file.txt

Optional: Configure Custom Domain

1

Add Custom Domain

In bucket settings → Public accessConnect Custom Domain:
  1. Enter your domain: files.yourdomain.com
  2. Cloudflare will provide DNS records to add
2

Update DNS Records

If domain is on Cloudflare:
  • Records are added automatically
If domain is elsewhere:
  • Add CNAME record: files.yourdomain.compub-xxxxx.r2.dev
3

Verify Configuration

Wait for DNS propagation (up to 24 hours, usually minutes). Test:
curl https://files.yourdomain.com
4

Update Terraform Variable

Use custom domain in terraform.tfvars:
r2_public_domain = "https://files.yourdomain.com"

Test Storage Configuration

Upload Test File to R2

Verify R2 credentials work correctly:
# Install AWS CLI (if not already installed)
aws --version

# Configure R2 endpoint
aws configure set aws_access_key_id YOUR_R2_ACCESS_KEY
aws configure set aws_secret_access_key YOUR_R2_SECRET_KEY

# Test upload
echo "Test file" > test.txt
aws s3 cp test.txt s3://llmstxt/test.txt \
  --endpoint-url YOUR_R2_ENDPOINT

# Verify public access
curl https://pub-xxxxx.r2.dev/test.txt
# Expected: Test file
If you can access the file via the public URL, R2 is configured correctly!

Test Supabase Connection

Verify Supabase credentials with a simple query:
# Using curl
curl -X GET "https://YOUR_PROJECT.supabase.co/rest/v1/crawl_sites" \
  -H "apikey: YOUR_SUPABASE_KEY" \
  -H "Authorization: Bearer YOUR_SUPABASE_KEY"

# Expected: []
# (empty array since no sites are added yet)

Credentials Summary

Before proceeding to Terraform, ensure you have all these values:
SUPABASE_URL=https://xxxxx.supabase.co
SUPABASE_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
R2_ENDPOINT=https://xxxxx.r2.cloudflarestorage.com
R2_ACCESS_KEY=abc123def456
R2_SECRET_KEY=aBcDeFgHiJkLmNoPqRs
R2_BUCKET=llmstxt
R2_PUBLIC_DOMAIN=https://pub-xxxxx.r2.dev

Data Privacy Considerations

Public Data: Files stored in R2 with public access enabled are accessible to anyone with the URL. Do not store sensitive data.
  • Generated llms.txt files are intentionally public (that’s their purpose)
  • Database records in Supabase are private (protected by row-level security)
  • URLs in R2 are not guessable (contain random hashes)
  • Consider implementing URL signing for additional security

Next Steps

Terraform Configuration

Configure Terraform variables and deploy AWS infrastructure

Build docs developers (and LLMs) love