Troubleshooting

Overview

This guide covers common error scenarios, their causes, and solutions. All errors are logged with structured context for efficient debugging.

Error Categories

Errors are organized into these categories:

Webhook Errors: Signature verification, deduplication issues
Tenant Configuration Errors: Missing or invalid tenant config
Instructions Errors: Missing or invalid prompts
Capacity Errors: Call limits exceeded
Tool Errors: Tool build or execution failures
Database Errors: MongoDB connection or query failures

Common Error Scenarios

Webhook Verification Failed

Error Code: webhook_verification_failed Cause: Invalid webhook signature from OpenAI Location: openai_webhook.py:81 Log Example:

{
  "event": "webhook_verification_failed",
  "level": "ERROR",
  "msg": "Invalid signature"
}

Solutions:

Issue	Solution
Wrong secret configured	Verify `OPENAI_WEBHOOK_SECRET` matches OpenAI console
Clock skew	Ensure server time is synchronized (NTP)
Proxy modifying headers	Check reverse proxy doesn’t strip signature headers

Tenant Not Configured

Error Code: tenant_not_configured Exception: TenantNotConfiguredError (exceptions.py:53) Cause: Tenant has no configuration document or invalid state Location: openai_webhook.py:346 Log Example:

{
  "event": "tenant_not_configured",
  "level": "ERROR",
  "tenant_id": "acme-corp",
  "error": {
    "error_type": "TenantNotConfiguredError",
    "reason": "state_doc_missing"
  }
}

Solutions:

Issue	Solution
Tenant not provisioned	Create tenant configuration in database
Invalid tenant ID in routing	Check phone number → tenant mapping
Configuration deleted	Restore tenant config from backup
State document malformed	Validate JSON schema of state document

Response to Caller:

{
  "ok": true,
  "rejected": "tenant_not_configured"
}

Instructions Missing

Error Code: instructions_missing Exception: InstructionsMissingError (exceptions.py:73) Cause: Tenant configured but instruction text cannot be resolved Location: openai_webhook.py:368 Log Example:

{
  "event": "instructions_missing",
  "level": "ERROR",
  "call_id": "call_abc123",
  "tenant_id": "acme-corp",
  "error": {
    "error_type": "InstructionsMissingError",
    "reason": "greeting_text_empty",
    "context": {
      "greeting_id": "greeting_xyz",
      "instruction_id": "inst_789"
    }
  }
}

Solutions:

Issue	Solution
Greeting text empty	Add greeting content to tenant configuration
Instruction text empty	Add instruction/prompt content
Referenced ID not found	Check greeting_id/instruction_id exist in database
Prompt deleted but still referenced	Update state pointers or restore prompt

Instructions DB Error

Error Code: instructions_db_error Exception: InstructionsDBError (exceptions.py:101) Cause: Database connection failure or query timeout Location: openai_webhook.py:390 Log Example:

{
  "event": "instructions_db_error",
  "level": "ERROR",
  "call_id": "call_abc123",
  "tenant_id": "acme-corp",
  "error": {
    "error_type": "InstructionsDBError",
    "reason": "db_timeout",
    "context": {
      "operation": "fetch_state",
      "cause_type": "ServerSelectionTimeoutError"
    }
  }
}

Behavior: System uses fallback instructions to keep service available during DB outage Fallback Content (from data/prompts/downtime.py):

DOWNTIME_GREETING = "Thank you for calling. We're experiencing technical difficulties."
DOWNTIME_PROMPT = "Apologize and ask caller to try again later."

Solutions:

Issue	Solution
MongoDB connection timeout	Check `MONGODB_URI` and network connectivity
Database overloaded	Scale MongoDB resources or add read replicas
Query timeout	Optimize indexes or increase timeout settings
Authentication failure	Verify MongoDB credentials

Metric: fallback_instructions_used increments when fallback is used

Capacity Limit Reached

Error Code: call_reject_failed or capacity rejection Cause: Global or per-tenant concurrent call limit exceeded Location: openai_webhook.py:298-337 Log Example:

{
  "event": "call_accepted",
  "level": "INFO",
  "rejected": "capacity",
  "tenant_id": "acme-corp"
}

Solutions:

Issue	Solution
Legitimate traffic spike	Increase `MAX_CONCURRENT_CALLS`
Single tenant monopolizing	Implement per-tenant limits
Calls not ending properly	Check for session cleanup bugs
Pending calls stuck	Review `_release_pending_capacity_state` calls

See: Capacity Management Guide

Tenant Config Parse Error

Error Code: tenant_config_parse_error Exception: TenantConfigParseError (exceptions.py:136) Cause: Tenant config exists but fails Pydantic validation Location: openai_webhook.py:487 Log Example:

{
  "event": "tenant_config_parse_error",
  "level": "ERROR",
  "tenant_id": "acme-corp"
}

Solutions:

Issue	Solution
Invalid JSON syntax	Validate JSON with linter
Missing required fields	Check TenantConfig model requirements
Wrong data types	Ensure fields match expected types (str, int, etc.)
Schema version mismatch	Update config to current schema version

Tool Build Error

Error Code: tool_build_error Exception: ToolBuildError (exceptions.py:240) Cause: Tool configuration invalid or tool builder failure Location: openai_webhook.py:537 Log Example:

{
  "event": "tool_build_error",
  "level": "ERROR",
  "tenant_id": "acme-corp"
}

Behavior: Call proceeds without tools (tools_build = None) Solutions:

Issue	Solution
Invalid tool configuration	Validate tool config schema
Missing tool dependencies	Ensure required tool modules installed
Tool initialization failure	Check tool constructor parameters

Tool Execution Errors

Exception: ToolExecutionError (exceptions.py:164) Variants:

ToolNotFoundError (exceptions.py:194): Tool invoked but not in tool list
ToolArgsParseError (exceptions.py:214): Cannot parse tool arguments JSON

Log Example:

{
  "level": "ERROR",
  "error_type": "ToolNotFoundError",
  "reason": "tool_not_found",
  "context": {
    "tool_name": "nonexistent_tool"
  }
}

Solutions:

Error	Solution
ToolNotFoundError	Ensure tool name matches tool list exactly
ToolArgsParseError	Validate arguments JSON format
ToolExecutionError	Check tool implementation for exceptions

Debugging with Structured Logging

Log Event Function

All events are logged using log_event from src/core/logger.py:97:

from src.core.logger import log_event
import logging

log_event(
    logging.ERROR,
    "event_name",
    "human-readable message",
    key1="value1",
    key2="value2"
)

Log Output Format

Logs use JSON format (logger.py:24):

{
  "ts": "2026-03-02T14:30:45",
  "level": "ERROR",
  "logger": "app",
  "msg": "human-readable message",
  "tenant_id": "acme-corp",
  "call_id": "call_abc123",
  "event": "event_name",
  "key1": "value1",
  "key2": "value2"
}

Context Variables

Tenant and call IDs are automatically attached via context vars (logger.py:9):

from src.core.logger import tenant_id_var, call_id_var

# Set context for current async task
tenant_id_var.set("acme-corp")
call_id_var.set("call_abc123")

# All subsequent logs include these fields
log_event(logging.INFO, "call_started")  # Includes tenant_id and call_id

Filtering Logs

Filter logs by event, tenant, or call:

# All errors
grep '"level":"ERROR"' logs.json

# Specific event
grep '"event":"webhook_verification_failed"' logs.json

# Specific tenant
grep '"tenant_id":"acme-corp"' logs.json

# Specific call
grep '"call_id":"call_abc123"' logs.json

# Combined filters with jq
cat logs.json | jq 'select(.tenant_id == "acme-corp" and .level == "ERROR")'

Log Levels

Configure via LOG_LEVEL environment variable (settings.py:22):

LOG_LEVEL=DEBUG  # DEBUG, INFO, WARNING, ERROR

Third-party loggers are suppressed to WARNING (logger.py:87):

logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("websockets").setLevel(logging.WARNING)
logging.getLogger("uvicorn.access").setLevel(logging.WARNING)

Call Flow Debugging

Trace Complete Call Lifecycle

Incoming webhook: realtime.call.incoming
Webhook verified: No webhook_verification_failed error
Deduplicated: Check duplicate_webhook_id
Tenant resolved: Check tenant_resolution_failed
Capacity checked: Check rejected: "capacity"
Instructions fetched: Check instructions_missing or instructions_db_error
Config loaded: Check tenant_config_parse_error
Tools built: Check tool_build_error
Call accepted: Look for call_accepted
Session started: Look for call_session_start_failed
Call ended: realtime.call.ended webhook

Search logs for all events for a specific call:

grep '"call_id":"call_abc123"' logs.json | jq -r '[.ts, .event, .level, .msg] | @tsv'

Health Checks

Application Health

curl http://localhost:8000/health

MongoDB Connectivity

Check for DB errors in logs:

grep 'InstructionsDBError' logs.json
grep 'ServerSelectionTimeoutError' logs.json

OpenAI API Connectivity

Check for call acceptance failures:

grep 'call_accept_failed' logs.json
grep 'call_reject_failed' logs.json

Exception Details

All custom exceptions include structured context via to_log_dict() (exceptions.py:44):

class ApplicationError(RuntimeError):
    def to_log_dict(self) -> dict[str, Any]:
        return {
            "error_type": self.__class__.__name__,
            "tenant_id": self.tenant_id,
            "reason": self.reason,
            "context": self.context,
        }

Quick Reference: Error Codes

Event	Severity	Cause	Response
`webhook_verification_failed`	ERROR	Invalid signature	HTTP 401
`duplicate_webhook_id`	WARNING	Duplicate webhook	Ignored
`tenant_resolution_failed`	ERROR	Unknown phone number	Call rejected
`tenant_not_configured`	ERROR	No tenant config	Call rejected
`instructions_missing`	ERROR	Missing prompts	Call rejected
`instructions_db_error`	ERROR	DB failure	Fallback prompts used
`tenant_config_parse_error`	ERROR	Invalid config JSON	Call rejected
`tool_build_error`	ERROR	Tool config invalid	Call proceeds without tools
`call_accept_failed`	ERROR	OpenAI API error	Call not accepted
`call_session_start_failed`	ERROR	Session start exception	Call hung up

Getting Help

When reporting issues, include:

Call ID: From logs or metrics
Tenant ID: Affected tenant
Event sequence: Filtered logs for the call
Error context: Full to_log_dict() output
Environment: LOG_LEVEL, capacity settings, MongoDB version

Development

Operations

Overview

Error Categories

Common Error Scenarios

Webhook Verification Failed

Tenant Not Configured

Instructions Missing

Instructions DB Error

Capacity Limit Reached

Tenant Config Parse Error

Tool Build Error

Tool Execution Errors

Debugging with Structured Logging

Log Event Function

Log Output Format

Context Variables

Filtering Logs

Log Levels

Call Flow Debugging

Trace Complete Call Lifecycle

Health Checks

Application Health

MongoDB Connectivity

OpenAI API Connectivity

Exception Details

Quick Reference: Error Codes

Getting Help

Build docs developers (and LLMs) love

Development

Operations

​Overview

​Error Categories

​Common Error Scenarios

​Webhook Verification Failed

​Tenant Not Configured

​Instructions Missing

​Instructions DB Error

​Capacity Limit Reached

​Tenant Config Parse Error

​Tool Build Error

​Tool Execution Errors

​Debugging with Structured Logging

​Log Event Function

​Log Output Format

​Context Variables

​Filtering Logs

​Log Levels

​Call Flow Debugging

​Trace Complete Call Lifecycle

​Health Checks

​Application Health

​MongoDB Connectivity

​OpenAI API Connectivity

​Exception Details

​Quick Reference: Error Codes

​Getting Help

Build docs developers (and LLMs) love

Overview

Error Categories

Common Error Scenarios

Webhook Verification Failed

Tenant Not Configured

Instructions Missing

Instructions DB Error

Capacity Limit Reached

Tenant Config Parse Error

Tool Build Error

Tool Execution Errors

Debugging with Structured Logging

Log Event Function

Log Output Format

Context Variables

Filtering Logs

Log Levels

Call Flow Debugging

Trace Complete Call Lifecycle

Health Checks

Application Health

MongoDB Connectivity

OpenAI API Connectivity

Exception Details

Quick Reference: Error Codes

Getting Help