Skip to main content

Overview

The llms.txt Generator is built with modern, production-ready technologies optimized for performance, scalability, and developer experience.

Backend Stack

Core Framework

FastAPI

Version: 0.122+
  • Asynchronous Python web framework
  • Built-in WebSocket support
  • Automatic API documentation (OpenAPI/Swagger)
  • High performance with async/await
  • Type hints with Pydantic validation

Python

Version: 3.11+
  • Modern async/await syntax
  • Excellent library ecosystem
  • Strong type hinting support
  • Performance improvements in 3.11+

Web Crawling & Scraping

Playwright

  • Headless browser automation
  • Chromium engine for JavaScript execution
  • Full page rendering support
  • Network request interception
  • Screenshot capabilities

BeautifulSoup4

  • HTML/XML parsing
  • DOM traversal and manipulation
  • Content extraction
  • Robust error handling

Brightdata

  • Proxy service for JS-heavy sites
  • Scraping Browser API
  • Bypass anti-bot protections
  • Global proxy network

httpx

  • Async HTTP client
  • HTTP/2 support
  • Connection pooling
  • Automatic redirects

LLM Integration (Optional)

OpenRouter

  • LLM API aggregator
  • Access to multiple models (Grok, GPT, Claude, etc.)
  • Content enhancement and optimization
  • Fallback model support

Grok 4.1-Fast

  • Default enhancement model
  • Fast response times
  • Content summarization
  • Structured output generation

Data Validation & Processing

# Key libraries
from pydantic import BaseModel, HttpUrl, Field
from dataclasses import dataclass
import hashlib  # Content hashing for change detection
import asyncio  # Async/await concurrency
from pydantic import BaseModel, HttpUrl

class CrawlRequest(BaseModel):
    url: HttpUrl
    maxPages: int = 50
    descLength: int = 500
    enableAutoUpdate: bool = False
    recrawlIntervalMinutes: int = 360
    llmEnhance: bool = False
    useBrightdata: bool = False
from dataclasses import dataclass

@dataclass
class PageInfo:
    url: str
    title: str
    description: str
    snippet: str

Frontend Stack

Core Framework

Next.js

Version: 15+
  • React framework for production
  • App Router architecture
  • Server and client components
  • Built-in optimization
  • API routes (used for proxying)

TypeScript

Version: 5+
  • Type safety across the application
  • Enhanced IDE support
  • Reduced runtime errors
  • Better refactoring experience

UI & Styling

Tailwind CSS

  • Utility-first CSS framework
  • Responsive design system
  • Custom theme configuration
  • Dark mode support
  • Optimized production builds

shadcn/ui

  • Re-usable component library
  • Accessible components
  • Customizable design tokens
  • Copy-paste component architecture

Real-time Communication

// WebSocket client implementation
const ws = new WebSocket(`${WS_URL}?api_key=${API_KEY}`);

ws.onopen = () => {
  ws.send(JSON.stringify(crawlRequest));
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  // Handle: log, result, url, error types
};
The frontend uses native WebSocket API for bidirectional, real-time communication with the backend.

Infrastructure Stack

AWS Services

ECS Fargate

  • Serverless container orchestration
  • Auto-scaling task management
  • No EC2 instance management
  • Pay-per-use pricing

ECR

  • Docker image registry
  • Vulnerability scanning
  • Image lifecycle policies
  • IAM-based access control

Application Load Balancer

  • HTTP/HTTPS traffic distribution
  • WebSocket support
  • Health checks
  • SSL/TLS termination

Lambda

  • Scheduled recrawl execution
  • Event-driven architecture
  • 10-minute timeout (600s)
  • 512MB memory allocation

EventBridge

  • Cron-based scheduling
  • Every 6 hours trigger
  • Event-driven invocation
  • CloudWatch integration

CloudWatch

  • Centralized logging
  • Metrics and alarms
  • 14-day log retention
  • Real-time monitoring

External Services

Supabase

PostgreSQL Database
  • Managed PostgreSQL
  • RESTful API
  • Real-time subscriptions
  • Row-level security
  • Auto-generated API

Cloudflare R2

Object Storage
  • S3-compatible API
  • Global CDN delivery
  • Zero egress fees
  • Public domain URLs
  • High availability

Vercel

Frontend Hosting
  • Next.js optimized platform
  • Edge network deployment
  • Automatic HTTPS
  • Preview deployments
  • Built-in analytics

Terraform

Infrastructure as Code
  • Declarative configuration
  • Version control for infra
  • Reproducible deployments
  • State management
  • AWS provider v5.0+

Development Tools

Testing

# Backend testing
pytest -v                    # Unit tests
pytest --cov=backend        # Coverage reports

# Frontend testing
npm run test                # Jest + React Testing Library
npm run test:e2e           # Playwright E2E tests

Code Quality

Backend

  • Black: Code formatting
  • Ruff: Fast linting
  • mypy: Static type checking
  • pytest: Testing framework

Frontend

  • ESLint: Code linting
  • Prettier: Code formatting
  • TypeScript: Type checking
  • Jest: Unit testing

Containerization

# Multi-stage Docker build for backend
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.11-slim
RUN playwright install chromium
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Version Requirements

ComponentMinimum VersionRecommended
Python3.113.11+
Node.js20.020.x LTS
Terraform1.0Latest 1.x
Docker20.0Latest
AWS CLI2.0Latest 2.x

Next Steps

Infrastructure Details

Deep dive into AWS infrastructure components and configuration

Data Flow

Understand how data flows through the system

Build docs developers (and LLMs) love