Overview
Sonore Phone Agent includes built-in logging and metrics systems for monitoring application health, call activity, and system performance. This guide covers the logging infrastructure and available metrics.Logging System
Architecture
The logging system is implemented insrc/core/logger.py and provides:
- Structured JSON logging for easy parsing and analysis
- Context variables for request tracing (
tenant_id,call_id) - Configurable log levels via environment variable
- Event-based logging with custom fields
Configuration
Configure logging via environment variables:Log Format
All logs are emitted as JSON objects with the following structure:ts- Timestamp in ISO 8601 format (UTC)level- Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)logger- Logger name (typically “app”)msg- Human-readable messagetenant_id- Tenant identifier from contextcall_id- Call identifier from context
extra parameter and are automatically included in the JSON output (source/src/core/logger.py:36-64).
Context Variables
The logging system uses context variables to automatically include tenant and call IDs in all log messages:contextvars for async-safe request tracking (source/src/core/logger.py:9-14).
Event Logging
Use thelog_event helper for structured event logging:
event field to the log output (source/src/core/logger.py:97-98).
Third-Party Log Filtering
The logger automatically reduces noise from third-party libraries (source/src/core/logger.py:86-89):httpx- Set to WARNING levelwebsockets- Set to WARNING leveluvicorn.access- Set to WARNING level
Metrics System
LiveMetricsStore
TheLiveMetricsStore class provides real-time metrics tracking for call activity (source/src/apps/calls/metrics/live_store.py:9-238).
Metrics tracked per tenant and globally:
Number of currently active calls
Total calls accepted by the system
Calls rejected due to capacity limits
Calls rejected because tenant is not configured
Calls rejected due to missing instructions
Database errors when fetching instructions
Number of times fallback instructions were used
Total calls that started successfully
Total calls that have ended
Calls that ended with an error
Calls that were referred to another destination
Total minutes of call time processed
Metric Storage
Metrics are stored in-memory with:- Per-tenant tracking - Isolated metrics for each tenant
- Global aggregation - System-wide metrics under the
__global__key (source/src/apps/calls/metrics/live_store.py:6) - Thread-safe operations - All metric updates use async locks (source/src/apps/calls/metrics/live_store.py:19)
Call Gates
The system uses “call gates” to prevent duplicate metric recording (source/src/apps/calls/metrics/live_store.py:8-13):call_id to ensure metrics are only incremented once per state transition.
Metric Operations
Recording call acceptance
Recording call acceptance
accepted_calls for the tenant and globally (source/src/apps/calls/metrics/live_store.py:30-48).Recording call start
Recording call start
started_calls and active_calls (source/src/apps/calls/metrics/live_store.py:50-71).Recording call end
Recording call end
ended_calls, decrements active_calls, and updates reason-specific counters (source/src/apps/calls/metrics/live_store.py:165-196).End reasons (source/src/models/metrics/store.py:15-19):COMPLETED- Call completed normallyHANGUP- User hung upREFERRED- Call transferred (incrementsreferred_calls)ERROR- Call failed (incrementsfailed_calls)
Recording rejections
Recording rejections
Recording usage
Recording usage
Retrieving Metrics
Get a snapshot of current metrics:Metrics Cleanup
The system automatically prunes old call gates to prevent memory growth:Integration Examples
Exporting to Prometheus
Create a metrics endpoint for Prometheus scraping:Logging to CloudWatch
The JSON log format integrates seamlessly with AWS CloudWatch Logs:Sending to Datadog
Health Checks
The application provides a basic health check endpoint (source/src/apps/calls/main.py:113-115):Enhanced Health Check
Create a more comprehensive health check that includes system status:Alerting Strategies
High Error Rate
Alert when
failed_calls / ended_calls > 5%Capacity Issues
Alert when
rejected_calls_capacity increasesDatabase Errors
Alert on
instructions_db_errors > 0Long-Running Calls
Monitor
active_calls that don’t decreaseBest Practices
Log aggregation
Log aggregation
- Use centralized logging (ELK, Splunk, CloudWatch)
- Parse JSON logs for filtering and analysis
- Set up log retention policies
- Index by
tenant_idandcall_idfor tracing
Metrics retention
Metrics retention
- Export metrics to time-series database (Prometheus, InfluxDB)
- Keep in-memory metrics for real-time dashboards
- Archive historical metrics for trend analysis
- Set up automated reporting
Performance monitoring
Performance monitoring
- Track
active_callsfor load balancing - Monitor
minutes_processedfor billing - Watch
rejected_calls_capacityto scale resources - Analyze
failed_callsfor reliability improvements
Debugging
Debugging
- Use
call_idto trace entire call lifecycle - Filter logs by
tenant_idfor tenant-specific issues - Set
LOG_LEVEL=DEBUGtemporarily for troubleshooting - Correlate metrics with log events
Next Steps
Installation
Set up the application from scratch
API Reference
Explore available endpoints