Storage

The storage module handles uploading generated llms.txt files to Cloudflare R2 (S3-compatible object storage). It uses boto3 for S3 API compatibility and generates deterministic filenames based on URL hashing.

Overview

The storage module provides a single async function for saving llms.txt content to R2:

MD5-based filename generation for consistent paths
S3-compatible upload using boto3
Public URL generation with custom domain support
Graceful handling when storage is not configured

Configuration

Storage requires the following environment variables (from config.settings):

r2_endpoint - R2 endpoint URL
r2_access_key - R2 access key ID
r2_secret_key - R2 secret access key
r2_bucket - Target bucket name
r2_public_domain - Optional custom domain for public URLs

Functions

save_llms_txt()

Saves llms.txt content to R2 storage and returns the public URL.

async def save_llms_txt(
    base_url: str,
    content: str,
    log: Callable
) -> str | None

base_url

str

required

The base URL of the crawled site (used for filename generation)

content

str

required

The formatted llms.txt content to upload

log

Callable

required

Logging function (sync or async) that accepts string messages

public_url

str | None

Public URL of the uploaded file, or None if upload failed or storage not configured

Behavior:

Checks if all required R2 settings are configured
If not configured, logs message and returns None
Creates boto3 S3 client with R2 credentials
Generates MD5 hash of base URL for deterministic filename
Uploads content with text/markdown content type
Constructs public URL using custom domain or endpoint
Logs upload success with URL
Returns public URL on success, None on error

Object Key Format:

llms/<md5_hash>.txt

Where <md5_hash> is the MD5 hash of the base URL.

Usage Examples

Basic Upload

from backend.storage import save_llms_txt

async def log(msg: str):
    print(msg)

base_url = "https://example.com"
content = "# Example Site\n\n> Description\n\n## Docs\n..."

public_url = await save_llms_txt(base_url, content, log)

if public_url:
    print(f"Uploaded to: {public_url}")
else:
    print("Upload failed or storage not configured")

Integration with Crawler and Formatter

from backend.crawler import LLMCrawler
from backend.formatter import format_llms_txt, get_md_url_map
from backend.storage import save_llms_txt

async def log(msg: str):
    print(msg)

# Crawl site
crawler = LLMCrawler(
    base_url="https://docs.example.com",
    max_pages=50,
    desc_length=150,
    log_callback=log
)

pages = await crawler.run()

# Format output
md_url_map = await get_md_url_map(pages)
llms_content = format_llms_txt(
    base_url="https://docs.example.com",
    pages=pages,
    md_url_map=md_url_map
)

# Upload to R2
public_url = await save_llms_txt(
    base_url="https://docs.example.com",
    content=llms_content,
    log=log
)

if public_url:
    await log(f"✓ llms.txt available at: {public_url}")

Without Storage Configured

# If R2 credentials are not set in environment
public_url = await save_llms_txt(base_url, content, log)

# Logs: "Storage not configured, skipping upload"
# Returns: None

Error Handling

from botocore.exceptions import ClientError

public_url = await save_llms_txt(base_url, content, log)

if public_url is None:
    # Either not configured or upload failed
    # Check logs for error details
    await log("Failed to upload - check R2 credentials")
else:
    # Success
    await log(f"Upload successful: {public_url}")

Public URL Generation

The function generates public URLs in two ways:

With Custom Domain

If settings.r2_public_domain is set:

public_url = f"{settings.r2_public_domain}/{object_key}"
# Example: https://cdn.example.com/llms/5d41402abc4b2a76b9719d911017c592.txt

Without Custom Domain

Falls back to R2 endpoint:

public_url = f"{settings.r2_endpoint}/{settings.r2_bucket}/{object_key}"
# Example: https://r2.example.com/mybucket/llms/5d41402abc4b2a76b9719d911017c592.txt

Filename Determinism

The MD5 hash ensures consistent filenames for the same base URL:

import hashlib

base_url = "https://example.com"
url_hash = hashlib.md5(base_url.encode()).hexdigest()
object_key = f"llms/{url_hash}.txt"

# Same base_url always produces same object_key
# This allows overwriting previous versions automatically

Benefits:

No need to track or delete old files
Recrawls automatically update the same file
Predictable URLs for integration

Content Type

Files are uploaded with Content-Type: text/markdown:

s3_client.put_object(
    Bucket=settings.r2_bucket,
    Key=object_key,
    Body=content.encode(),
    ContentType='text/markdown'  # Enables proper rendering
)

This ensures browsers and LLM tools recognize the content as markdown.

Error Handling

The function handles errors gracefully:

try:
    # Upload logic
    return public_url
except ClientError as e:
    log(f"Storage error: {str(e)}")
    return None

Common Errors:

Invalid credentials → ClientError
Network timeout → ClientError
Bucket not found → ClientError
Permission denied → ClientError

All errors are logged and return None.

Dependencies

boto3 - AWS SDK for S3-compatible operations
botocore - Error handling for boto3
config - Settings management

formatter - Generates the content to upload
database - Stores the public URL in Supabase
config - Provides R2 configuration settings

Notes

Function is async for consistency with other backend modules
Boto3 operations are synchronous (no async boto3 used)
Graceful degradation when storage not configured
Public URLs are immediately accessible after upload
No explicit public ACL needed (bucket configured for public reads)

Endpoints

Backend Modules

Overview

Configuration

Functions

save_llms_txt()

Usage Examples

Basic Upload

Integration with Crawler and Formatter

Without Storage Configured

Error Handling

Public URL Generation

With Custom Domain

Without Custom Domain

Filename Determinism

Content Type

Error Handling

Dependencies

Notes

Build docs developers (and LLMs) love

Endpoints

Backend Modules

​Overview

​Configuration

​Functions

​save_llms_txt()

​Usage Examples

​Basic Upload

​Integration with Crawler and Formatter

​Without Storage Configured

​Error Handling

​Public URL Generation

​With Custom Domain

​Without Custom Domain

​Filename Determinism

​Content Type

​Error Handling

​Dependencies

​Related Modules

​Notes

Build docs developers (and LLMs) love

Overview

Configuration

Functions

save_llms_txt()

Usage Examples

Basic Upload

Integration with Crawler and Formatter

Without Storage Configured

Error Handling

Public URL Generation

With Custom Domain

Without Custom Domain

Filename Determinism

Content Type

Error Handling

Dependencies

Related Modules

Notes