Evaluator

Overview

The Evaluator class handles AI-powered lead evaluation using Groq’s LLM API. It analyzes extracted content, matches services against a predefined catalog, generates fit scores, and provides personalized outreach recommendations. The class includes automatic validation, retry logic, and quota tracking.

Constructor

Evaluator()

model

string

default:"llama-3.3-70b-versatile"

The Groq model to use for evaluation. Supported models:

llama-3.3-70b-versatile (default, recommended)
llama-3.1-70b-versatile
mixtral-8x7b-32768

from evaluator import Evaluator

# Use default model
evaluator = Evaluator()

# Use specific model
evaluator = Evaluator(model="mixtral-8x7b-32768")

Environment Variables Required:

GROQ_API_KEY - Your Groq API key (required)

Configuration Files Required:

services/services.json - Service catalog for validation
prompts/system_prompt.md - System prompt template

Raises:

ValueError - If GROQ_API_KEY is not found in environment variables

The Evaluator requires a valid Groq API key. Sign up at console.groq.com to get your API key.

Class Properties

The Evaluator maintains class-level tracking across all instances:

status

string

default:"System Online"

Current system status. Updates to “Rate Limited / Quota Reached” on API errors.

quota_ok

boolean

default:"true"

Whether the API quota is available. Set to false on rate limit errors.

last_run_time

string

Timestamp of the last evaluation run in format “YYYY-MM-DD HH:MM:SS”

total_usage

object

Cumulative token usage across all evaluations:

Show Usage Fields

prompt_tokens

number

default:"0"

Total prompt tokens used

completion_tokens

number

default:"0"

Total completion tokens used

total_tokens

number

default:"0"

Total tokens used

Methods

evaluate()

Analyzes content and returns a structured evaluation with service matching and scoring.

content

string

required

The website content to analyze. Typically extracted text from a webpage.

rag_context

list | string

Additional context from the knowledge base to inform the evaluation. Can be a list of strings or a single string.

retry_count

number

default:"1"

Number of retry attempts on failure. Total attempts will be retry_count + 1.

Returns:

result

dict

Structured evaluation result with the following fields:

Show Evaluation Fields

business_name

string

Extracted business name

business_type

string

Business category (e.g., “Local Service Business”, “E-commerce”, “SaaS”, “Healthcare”)

primary_service

string

Primary service from the validated services catalog

secondary_service

string

Secondary service if applicable (must be from validated catalog)

fit_score

number

Lead quality score from 0-100 indicating how well the business matches your target profile

reasoning

string

AI-generated explanation for the fit score with specific details

outreach_angle

string

Personalized outreach recommendation tailored to the business

_usage

object

Token usage for this specific evaluation

Show Usage Metadata

prompt_tokens

number

Tokens in the prompt

completion_tokens

number

Tokens in the response

total_tokens

number

Total tokens for this request

Raises:

Exception - If services.json cannot be loaded (CRITICAL error)
Exception - If system_prompt.md cannot be loaded (CRITICAL error)
Exception - If LLM returns invalid JSON after all retries
ValueError - If selected primary_service is not in the approved services catalog
ValueError - If selected secondary_service is not in the approved services catalog
Exception - If Groq API errors persist after all retries

The method validates all service selections against services.json. Invalid services will raise a ValueError to ensure data integrity.

Usage Examples

Basic Evaluation

from evaluator import Evaluator
import json

evaluator = Evaluator()

content = """
Welcome to Austin Premier Plumbing. We are a local plumbing company 
serving Austin, Texas and surrounding areas. We offer emergency pipe repairs, 
water heater installation, drain cleaning, and 24/7 emergency services.
Call us today for a free quote!
"""

try:
    result = evaluator.evaluate(content)
    print(json.dumps(result, indent=2))
    
    print(f"\nBusiness: {result['business_name']}")
    print(f"Service: {result['primary_service']}")
    print(f"Score: {result['fit_score']}/100")
    print(f"Reasoning: {result['reasoning']}")
    print(f"Tokens Used: {result['_usage']['total_tokens']}")
    
except Exception as e:
    print(f"Evaluation failed: {e}")

Evaluation with RAG Context

from evaluator import Evaluator
from rag import RAG

evaluator = Evaluator()
rag = RAG()

content = "We help small businesses grow with digital marketing services."

# Retrieve relevant knowledge
rag_context = rag.retrieve(content, limit=3)

# Evaluate with additional context
result = evaluator.evaluate(content, rag_context=rag_context)

print(f"Evaluation with {len(rag_context)} knowledge items")
print(f"Score: {result['fit_score']}")
print(f"Outreach: {result['outreach_angle']}")

Custom Retry Logic

from evaluator import Evaluator
import time

evaluator = Evaluator()
content = "Medical clinic in downtown Yangon offering general practice services."

# Try up to 3 times (initial + 2 retries)
try:
    result = evaluator.evaluate(content, retry_count=2)
    print(f"Success after potential retries: {result['business_name']}")
except Exception as e:
    print(f"Failed after all retries: {e}")

Monitoring Token Usage

from evaluator import Evaluator

evaluator = Evaluator()

# Process multiple leads
leads = [
    "Local restaurant in Bangkok",
    "E-commerce store selling electronics",
    "Software consulting company"
]

for lead_content in leads:
    result = evaluator.evaluate(lead_content)
    print(f"Processed: {result['business_name']}")
    print(f"Request tokens: {result['_usage']['total_tokens']}")

# Check cumulative usage
print(f"\nTotal Usage Across All Evaluations:")
print(f"Prompt Tokens: {Evaluator.total_usage['prompt_tokens']}")
print(f"Completion Tokens: {Evaluator.total_usage['completion_tokens']}")
print(f"Total Tokens: {Evaluator.total_usage['total_tokens']}")

Checking System Status

from evaluator import Evaluator

evaluator = Evaluator()

print(f"System Status: {Evaluator.status}")
print(f"Quota OK: {Evaluator.quota_ok}")
print(f"Last Run: {Evaluator.last_run_time}")

try:
    result = evaluator.evaluate("Test content")
except Exception as e:
    # Status updates automatically on rate limit errors
    print(f"Error: {e}")
    print(f"Updated Status: {Evaluator.status}")
    print(f"Quota OK: {Evaluator.quota_ok}")

Service Validation

The Evaluator automatically validates all service selections against the services catalog:

Load Services Catalog

Loads services from services/services.json at evaluation time

Extract Valid Names

Recursively extracts all service names from the JSON structure

Validate Primary Service

Ensures primary_service matches a name in the catalog

Validate Secondary Service

If present, ensures secondary_service also matches the catalog

Raise on Mismatch

Throws ValueError if any service is not in the approved list

JSON Response Parsing

The Evaluator uses advanced parsing to handle Groq’s JSON output:

Requests response_format={"type": "json_object"} for structured output
Handles cases where Groq adds markdown code blocks around JSON
Extracts JSON from text by finding first { to last }
Validates JSON parsing before returning

The parser is robust against common LLM output variations, including markdown code blocks and surrounding text.

Configuration

Required Files

services/services.json

{
  "categories": [
    {
      "name": "Home Services",
      "services": [
        {"name": "Plumbing"},
        {"name": "HVAC"},
        {"name": "Electrical"}
      ]
    }
  ]
}

prompts/system_prompt.md

You are an expert lead evaluator. Analyze the website content and provide a structured evaluation.

Available services:
[SERVICES_JSON]

Provide your response as JSON with these fields:
- business_name
- business_type
- primary_service
- secondary_service (optional)
- fit_score (0-100)
- reasoning
- outreach_angle

The [SERVICES_JSON] placeholder in the system prompt is automatically replaced with the contents of services.json.

Rate Limits and Quotas

Groq has rate limits that vary by plan:

Free tier: 30 requests/minute, 14,400 requests/day
Paid tier: Higher limits based on your plan

The Evaluator automatically tracks quota status and updates Evaluator.quota_ok on rate limit errors.

Performance

Typical evaluation metrics:

Latency: 1-3 seconds per evaluation
Tokens per request: 1,500-3,000 tokens (varies by content length)
Temperature: 0.1 (optimized for consistent, deterministic outputs)

LeadEngine - Orchestrates the full pipeline including Evaluator
RAG - Provides context to enhance evaluations
Extractor - Extracts content for evaluation

Core Components

Integrations

Overview

Constructor

Evaluator()

Class Properties

Methods

evaluate()

Usage Examples

Basic Evaluation

Evaluation with RAG Context

Custom Retry Logic

Monitoring Token Usage

Checking System Status

Service Validation

JSON Response Parsing

Configuration

Required Files

Rate Limits and Quotas

Performance

Build docs developers (and LLMs) love

Core Components

Integrations

​Overview

​Constructor

​Evaluator()

​Class Properties

​Methods

​evaluate()

​Usage Examples

​Basic Evaluation

​Evaluation with RAG Context

​Custom Retry Logic

​Monitoring Token Usage

​Checking System Status

​Service Validation

​JSON Response Parsing

​Configuration

​Required Files

​Rate Limits and Quotas

​Performance

​Related Components

Build docs developers (and LLMs) love

Overview

Constructor

Evaluator()

Class Properties

Methods

evaluate()

Usage Examples

Basic Evaluation

Evaluation with RAG Context

Custom Retry Logic

Monitoring Token Usage

Checking System Status

Service Validation

JSON Response Parsing

Configuration

Required Files

Rate Limits and Quotas

Performance

Related Components