Overview
Scribe’s email generation system uses a stateless, in-memory pipeline that transforms a template into a personalized email through 4 sequential steps:- Template Parser - Analyze template and extract search terms
- Web Scraper - Fetch and summarize relevant information
- ArXiv Enricher - Conditionally fetch academic papers
- Email Composer - Generate final email and persist to database
Execution Time: 10-25 seconds depending on template complexity and web content availability.
PipelineData Structure
All pipeline state lives in a singlePipelineData dataclass passed through each step. No intermediate database writes occur—only the final email is persisted.
pipeline/models/core.py
Why Dataclass over Pydantic?
- Lighter weight: No validation overhead during execution
- Faster instantiation: Critical for high-throughput processing
- Validation only at boundaries: API requests/responses use Pydantic
Helper Methods
ThePipelineData class includes utility methods for tracking execution:
pipeline/models/core.py
StepResult
Each pipeline step returns aStepResult to indicate success or failure:
pipeline/models/core.py
Data Flow
Performance Characteristics
Execution Time Breakdown
| Step | Avg Time | Variance | Bottleneck |
|---|---|---|---|
| Template Parser | 1.2s | Low | LLM API call |
| Web Scraper | 5.3s | High | Playwright rendering |
| ArXiv Enricher | 0.8s | Low | ArXiv API response |
| Email Composer | 3.1s | Medium | LLM generation + validation |
| Total | 10.4s | Medium | Network + LLM latency |
Variance Factors: Web Scraper time depends on website complexity and JavaScript load time. Email Composer validation retries can add 2-6s.
Memory Usage (512MB RAM Deployment)
- Sequential scraping (not parallel) to limit browser instances
- Smart chunking to avoid loading full content in memory
- No intermediate database writes
Error Handling
Error Categories
Fatal Errors (stop pipeline)
Fatal Errors (stop pipeline)
- Template Parser fails (can’t proceed without search terms)
- Email Composer database write fails
- Invalid input data (Pydantic validation)
Non-Fatal Errors (continue with degraded service)
Non-Fatal Errors (continue with degraded service)
- Some URLs fail to scrape (continue with successful ones)
- ArXiv API timeout (continue without papers)
- Email validation warnings (still persist email)
Retry Strategy
Celery tasks automatically retry on transient failures:- Max Retries: 3 attempts
- Backoff: 60s, 120s, fail permanently
- Retriable: External API failures, database connection errors, network timeouts
- Non-Retriable: Invalid input data, user not found, quota exceeded
Related Concepts
Template Types
Learn about RESEARCH, BOOK, and GENERAL template classification
Queue System
Understand how batch processing and job status tracking work
Authentication
See how JWT validation protects pipeline execution
