Skip to main content

Overview

This guide covers common error scenarios, their causes, and solutions. All errors are logged with structured context for efficient debugging.

Error Categories

Errors are organized into these categories:
  1. Webhook Errors: Signature verification, deduplication issues
  2. Tenant Configuration Errors: Missing or invalid tenant config
  3. Instructions Errors: Missing or invalid prompts
  4. Capacity Errors: Call limits exceeded
  5. Tool Errors: Tool build or execution failures
  6. Database Errors: MongoDB connection or query failures

Common Error Scenarios

Webhook Verification Failed

Error Code: webhook_verification_failed Cause: Invalid webhook signature from OpenAI Location: openai_webhook.py:81 Log Example:
{
  "event": "webhook_verification_failed",
  "level": "ERROR",
  "msg": "Invalid signature"
}
Solutions:
IssueSolution
Wrong secret configuredVerify OPENAI_WEBHOOK_SECRET matches OpenAI console
Clock skewEnsure server time is synchronized (NTP)
Proxy modifying headersCheck reverse proxy doesn’t strip signature headers

Tenant Not Configured

Error Code: tenant_not_configured Exception: TenantNotConfiguredError (exceptions.py:53) Cause: Tenant has no configuration document or invalid state Location: openai_webhook.py:346 Log Example:
{
  "event": "tenant_not_configured",
  "level": "ERROR",
  "tenant_id": "acme-corp",
  "error": {
    "error_type": "TenantNotConfiguredError",
    "reason": "state_doc_missing"
  }
}
Solutions:
IssueSolution
Tenant not provisionedCreate tenant configuration in database
Invalid tenant ID in routingCheck phone number → tenant mapping
Configuration deletedRestore tenant config from backup
State document malformedValidate JSON schema of state document
Response to Caller:
{
  "ok": true,
  "rejected": "tenant_not_configured"
}

Instructions Missing

Error Code: instructions_missing Exception: InstructionsMissingError (exceptions.py:73) Cause: Tenant configured but instruction text cannot be resolved Location: openai_webhook.py:368 Log Example:
{
  "event": "instructions_missing",
  "level": "ERROR",
  "call_id": "call_abc123",
  "tenant_id": "acme-corp",
  "error": {
    "error_type": "InstructionsMissingError",
    "reason": "greeting_text_empty",
    "context": {
      "greeting_id": "greeting_xyz",
      "instruction_id": "inst_789"
    }
  }
}
Solutions:
IssueSolution
Greeting text emptyAdd greeting content to tenant configuration
Instruction text emptyAdd instruction/prompt content
Referenced ID not foundCheck greeting_id/instruction_id exist in database
Prompt deleted but still referencedUpdate state pointers or restore prompt

Instructions DB Error

Error Code: instructions_db_error Exception: InstructionsDBError (exceptions.py:101) Cause: Database connection failure or query timeout Location: openai_webhook.py:390 Log Example:
{
  "event": "instructions_db_error",
  "level": "ERROR",
  "call_id": "call_abc123",
  "tenant_id": "acme-corp",
  "error": {
    "error_type": "InstructionsDBError",
    "reason": "db_timeout",
    "context": {
      "operation": "fetch_state",
      "cause_type": "ServerSelectionTimeoutError"
    }
  }
}
Behavior: System uses fallback instructions to keep service available during DB outage Fallback Content (from data/prompts/downtime.py):
DOWNTIME_GREETING = "Thank you for calling. We're experiencing technical difficulties."
DOWNTIME_PROMPT = "Apologize and ask caller to try again later."
Solutions:
IssueSolution
MongoDB connection timeoutCheck MONGODB_URI and network connectivity
Database overloadedScale MongoDB resources or add read replicas
Query timeoutOptimize indexes or increase timeout settings
Authentication failureVerify MongoDB credentials
Metric: fallback_instructions_used increments when fallback is used

Capacity Limit Reached

Error Code: call_reject_failed or capacity rejection Cause: Global or per-tenant concurrent call limit exceeded Location: openai_webhook.py:298-337 Log Example:
{
  "event": "call_accepted",
  "level": "INFO",
  "rejected": "capacity",
  "tenant_id": "acme-corp"
}
Solutions:
IssueSolution
Legitimate traffic spikeIncrease MAX_CONCURRENT_CALLS
Single tenant monopolizingImplement per-tenant limits
Calls not ending properlyCheck for session cleanup bugs
Pending calls stuckReview _release_pending_capacity_state calls
See: Capacity Management Guide

Tenant Config Parse Error

Error Code: tenant_config_parse_error Exception: TenantConfigParseError (exceptions.py:136) Cause: Tenant config exists but fails Pydantic validation Location: openai_webhook.py:487 Log Example:
{
  "event": "tenant_config_parse_error",
  "level": "ERROR",
  "tenant_id": "acme-corp"
}
Solutions:
IssueSolution
Invalid JSON syntaxValidate JSON with linter
Missing required fieldsCheck TenantConfig model requirements
Wrong data typesEnsure fields match expected types (str, int, etc.)
Schema version mismatchUpdate config to current schema version

Tool Build Error

Error Code: tool_build_error Exception: ToolBuildError (exceptions.py:240) Cause: Tool configuration invalid or tool builder failure Location: openai_webhook.py:537 Log Example:
{
  "event": "tool_build_error",
  "level": "ERROR",
  "tenant_id": "acme-corp"
}
Behavior: Call proceeds without tools (tools_build = None) Solutions:
IssueSolution
Invalid tool configurationValidate tool config schema
Missing tool dependenciesEnsure required tool modules installed
Tool initialization failureCheck tool constructor parameters

Tool Execution Errors

Exception: ToolExecutionError (exceptions.py:164) Variants:
  • ToolNotFoundError (exceptions.py:194): Tool invoked but not in tool list
  • ToolArgsParseError (exceptions.py:214): Cannot parse tool arguments JSON
Log Example:
{
  "level": "ERROR",
  "error_type": "ToolNotFoundError",
  "reason": "tool_not_found",
  "context": {
    "tool_name": "nonexistent_tool"
  }
}
Solutions:
ErrorSolution
ToolNotFoundErrorEnsure tool name matches tool list exactly
ToolArgsParseErrorValidate arguments JSON format
ToolExecutionErrorCheck tool implementation for exceptions

Debugging with Structured Logging

Log Event Function

All events are logged using log_event from src/core/logger.py:97:
from src.core.logger import log_event
import logging

log_event(
    logging.ERROR,
    "event_name",
    "human-readable message",
    key1="value1",
    key2="value2"
)

Log Output Format

Logs use JSON format (logger.py:24):
{
  "ts": "2026-03-02T14:30:45",
  "level": "ERROR",
  "logger": "app",
  "msg": "human-readable message",
  "tenant_id": "acme-corp",
  "call_id": "call_abc123",
  "event": "event_name",
  "key1": "value1",
  "key2": "value2"
}

Context Variables

Tenant and call IDs are automatically attached via context vars (logger.py:9):
from src.core.logger import tenant_id_var, call_id_var

# Set context for current async task
tenant_id_var.set("acme-corp")
call_id_var.set("call_abc123")

# All subsequent logs include these fields
log_event(logging.INFO, "call_started")  # Includes tenant_id and call_id

Filtering Logs

Filter logs by event, tenant, or call:
# All errors
grep '"level":"ERROR"' logs.json

# Specific event
grep '"event":"webhook_verification_failed"' logs.json

# Specific tenant
grep '"tenant_id":"acme-corp"' logs.json

# Specific call
grep '"call_id":"call_abc123"' logs.json

# Combined filters with jq
cat logs.json | jq 'select(.tenant_id == "acme-corp" and .level == "ERROR")'

Log Levels

Configure via LOG_LEVEL environment variable (settings.py:22):
LOG_LEVEL=DEBUG  # DEBUG, INFO, WARNING, ERROR
Third-party loggers are suppressed to WARNING (logger.py:87):
logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("websockets").setLevel(logging.WARNING)
logging.getLogger("uvicorn.access").setLevel(logging.WARNING)

Call Flow Debugging

Trace Complete Call Lifecycle

  1. Incoming webhook: realtime.call.incoming
  2. Webhook verified: No webhook_verification_failed error
  3. Deduplicated: Check duplicate_webhook_id
  4. Tenant resolved: Check tenant_resolution_failed
  5. Capacity checked: Check rejected: "capacity"
  6. Instructions fetched: Check instructions_missing or instructions_db_error
  7. Config loaded: Check tenant_config_parse_error
  8. Tools built: Check tool_build_error
  9. Call accepted: Look for call_accepted
  10. Session started: Look for call_session_start_failed
  11. Call ended: realtime.call.ended webhook
Search logs for all events for a specific call:
grep '"call_id":"call_abc123"' logs.json | jq -r '[.ts, .event, .level, .msg] | @tsv'

Health Checks

Application Health

curl http://localhost:8000/health

MongoDB Connectivity

Check for DB errors in logs:
grep 'InstructionsDBError' logs.json
grep 'ServerSelectionTimeoutError' logs.json

OpenAI API Connectivity

Check for call acceptance failures:
grep 'call_accept_failed' logs.json
grep 'call_reject_failed' logs.json

Exception Details

All custom exceptions include structured context via to_log_dict() (exceptions.py:44):
class ApplicationError(RuntimeError):
    def to_log_dict(self) -> dict[str, Any]:
        return {
            "error_type": self.__class__.__name__,
            "tenant_id": self.tenant_id,
            "reason": self.reason,
            "context": self.context,
        }

Quick Reference: Error Codes

EventSeverityCauseResponse
webhook_verification_failedERRORInvalid signatureHTTP 401
duplicate_webhook_idWARNINGDuplicate webhookIgnored
tenant_resolution_failedERRORUnknown phone numberCall rejected
tenant_not_configuredERRORNo tenant configCall rejected
instructions_missingERRORMissing promptsCall rejected
instructions_db_errorERRORDB failureFallback prompts used
tenant_config_parse_errorERRORInvalid config JSONCall rejected
tool_build_errorERRORTool config invalidCall proceeds without tools
call_accept_failedERROROpenAI API errorCall not accepted
call_session_start_failedERRORSession start exceptionCall hung up

Getting Help

When reporting issues, include:
  1. Call ID: From logs or metrics
  2. Tenant ID: Affected tenant
  3. Event sequence: Filtered logs for the call
  4. Error context: Full to_log_dict() output
  5. Environment: LOG_LEVEL, capacity settings, MongoDB version

Build docs developers (and LLMs) love