Overview
The Lead Intelligence Engine is a stateless, pipeline-based system that processes business URLs through four distinct stages. Each component is independently testable and loosely coupled.High-Level Architecture
Core Components
LeadEngine (Orchestrator)
The central orchestration layer that coordinates all pipeline stages. Location:core.py
Responsibilities:
- Initialize all components
- Execute pipeline stages in sequence
- Handle errors and logging
- Track latency metrics
- Return structured results
Extraction Phase
Calls
Extractor.process(url) to fetch and clean website content.- Tries local BeautifulSoup extraction first
- Falls back to Jina API for SPAs
- Returns plain text (max 10,000 chars)
RAG Retrieval Phase
Calls
RAG.retrieve(content) to fetch relevant knowledge.- Searches local markdown files in
knowledge/ - Returns top 3 matching documents
- Continues even if RAG fails
Evaluation Phase
Calls
Evaluator.evaluate(content, rag_context) for AI analysis.- Sends to Groq LLM with system prompt
- Receives structured JSON response
- Validates service names against catalog
core.py excerpt
Extractor
Location:extractor.py
Purpose: Fetches and cleans web content from URLs.
Strategy:
-
Local extraction (BeautifulSoup + requests)
- Fast, no external dependencies
- Works for server-rendered HTML
- Removes scripts, styles, nav, footer
-
Jina fallback (r.jina.ai)
- Triggered if local extraction < 200 chars
- Handles SPAs and JavaScript-heavy sites
- Returns pre-cleaned markdown
-
Facebook routing (Graph API)
- Detects facebook.com URLs
- Routes to
facebook_client.py - Extracts public page data
The extractor truncates content to 10,000 characters to stay within LLM token limits.
RAG System
Location:rag.py
Purpose: Retrieves relevant domain knowledge to enrich AI evaluation.
Knowledge Base: Markdown files in knowledge/ directory:
Lead_q_criteria.md- Qualification rulesstrategy_efficiency.md- Conversation angle framework
- Keyword-based retrieval with intersection scoring
- Boosts for long words (>4 chars)
- Fallback for high-value terms (yangon, medical, digital, etc.)
- Returns top 3 documents by score
RAG System Deep Dive
Learn how knowledge retrieval works
Evaluator
Location:evaluator.py
Purpose: AI-powered business analysis and service matching.
Features:
- Groq LLM integration (llama-3.3-70b-versatile)
- System prompt from
prompts/system_prompt.md - Service catalog validation from
services/services.json - Token usage tracking (class-level accumulation)
- Retry logic for transient failures
- Ensures
primary_serviceexists in catalog - Validates
secondary_serviceif present - Rejects responses with invalid service names
Evaluation Pipeline
Learn about AI evaluation and service matching
CodaClient
Location:coda_client.py
Purpose: CRM integration with duplicate prevention.
Operations:
fetch_row_by_url(url)- Duplicate detectioninsert_row(data)- Create new CRM records_get_columns()- Dynamic column mapping
Duplicate Detection
Learn how URL-based deduplication works
User Interfaces
CLI Interface
Location:main.py
Usage:
- Single URL processing
- Formatted JSON output
- Token usage display
- Exit codes for automation
Telegram Bot
Location:telegram_bot.py
Commands:
/analyze <url>- Process URL/model- Show LLM model/status- System status report
- Rate limiting (3 req/min per user)
- In-memory request tracking
- Async message handling
- Error recovery
Telegram Bot Guide
Deploy and use the interactive bot interface
Design Principles
Stateless Architecture
No caching, no sessions, no persistence beyond Coda. Every URL is analyzed fresh.
- Simple deployment (no database needed)
- Horizontal scaling possible
- No stale data issues
- Easy to reason about
- Repeated URLs consume tokens
- No performance optimization from caching
Fail-Safe Defaults
Each component handles missing dependencies gracefully:Modular Testing
Every component can run independently:Performance Characteristics
Latency Breakdown
Typical timing for a standard business website:| Stage | Time | Notes |
|---|---|---|
| Extraction | 2-5s | Network + parsing |
| RAG Retrieval | <0.1s | Local file search |
| AI Evaluation | 4-8s | Groq LLM inference |
| Coda API | 1-2s | Duplicate check + insert |
| Total | 7-15s | Target: < 20s |
Token Economics
Average per-URL consumption:- Prompt tokens: 2,000-4,000
- Website content: 1,500-3,000
- System prompt: ~500
- RAG context: ~500
- Completion tokens: 200-400
- JSON response: ~300
Configuration Files
Environment Variables (.env)
Service Catalog (services/services.json)
Defines available services for matching:
System Prompt (prompts/system_prompt.md)
Instructs the LLM on:
- Output format (JSON schema)
- Service matching rules
- Industry exclusions
- Fit scoring guidelines
Configuration Guide
Customize services, prompts, and knowledge base
Error Handling
The engine uses a cascading error strategy:Deployment Patterns
Single Server
Docker Container
Dockerfile
Multiple Workers
Stateless design allows parallel processing:Next Steps
RAG System
Deep dive into knowledge retrieval
Evaluation Pipeline
Learn how AI evaluation works
Duplicate Detection
Understand URL-based deduplication
API Reference
Explore the programmatic API