Testing Strategies

Comprehensive testing ensures the agent behaves correctly across various scenarios. This guide covers testing strategies for tools, prompts, and integration flows.

Test Configuration

The project uses pytest with async support. Configuration from pyproject.toml:

[tool.pytest.ini_options]
minversion = "8.0"
testpaths = ["tests"]
addopts = "-ra"
asyncio_mode = "auto"

Key settings:

minversion: Requires pytest 8.0+
testpaths: Tests are in the tests/ directory
addopts: -ra shows summary of all test results
asyncio_mode: auto automatically detects and runs async tests

Test Dependencies

From pyproject.toml:28-39, the dev dependencies include:

[dependency-groups]
dev = [
  "pytest>=8.0.0",           # Test framework
  "pytest-asyncio>=0.23.0",  # Async test support
  "pytest-cov>=5.0.0",       # Coverage reporting
  "pytest-mock>=3.14.0",     # Mocking utilities
  "httpx>=0.28.1",           # HTTP client for testing
  "freezegun>=1.2.2",        # Time mocking
  "testcontainers>=4.0.0",   # Container-based integration tests
  "mongomock>=4.1.2",        # MongoDB mocking
]

Install with:

uv sync --dev

Testing Tools

Basic Tool Test Structure

Tools are async functions that return ToolResult. Test them with pytest-asyncio:

import pytest
import json
from src.apps.calls.tools.definitions.request_transfer import request_transfer_tool
from src.models.openai.openai import ToolInvocation, ToolResult
from src.models.instructions.tenant_config import TenantConfig, Features, ReferFeature

@pytest.mark.asyncio
async def test_request_transfer_success():
    """Test successful call transfer."""
    # Arrange
    tenant_config = TenantConfig(
        tenant_id="test-tenant",
        features=Features(
            refer=ReferFeature(
                enabled=True,
                destinations=[
                    {"destination_id": "commercial", "target_uri": "sip:[email protected]"},
                ]
            )
        )
    )
    
    invocation = ToolInvocation(
        call_id="test-call-123",
        function_call_id="func-call-456",
        response_id="resp-789",
        item_id="item-012",
        name="request_transfer",
        arguments_json='{"destination_id": "commercial", "reason": "sales inquiry"}',
        tenant_config=tenant_config,
    )
    
    args = {"destination_id": "commercial", "reason": "sales inquiry"}
    
    # Mock the call service
    mock_call_service = MockCallService()
    deps = {"call_service": mock_call_service}
    
    # Act
    result = await request_transfer_tool(args, invocation, deps)
    
    # Assert
    assert result.ok is True
    assert result.error is None
    assert result.function_call_id == "func-call-456"
    
    output = json.loads(result.output)
    assert output["message"] == "call_transfer_requested"
    assert output["target_uri"] == "sip:[email protected]"
    
    # Verify the call service was invoked
    assert mock_call_service.refer_called
    assert mock_call_service.last_call_id == "test-call-123"

Arrange

Create test data: tenant config, invocation object, args, and dependencies.

Act

Call the tool function with test data.

Assert

Verify the result structure, values, and side effects.

Testing Error Conditions

Test how tools handle invalid inputs:

@pytest.mark.asyncio
async def test_request_transfer_missing_destination():
    """Test transfer with missing destination_id."""
    args = {"reason": "test"}  # Missing required destination_id
    invocation = create_test_invocation()  # Helper to create invocation
    
    result = await request_transfer_tool(args, invocation, None)
    
    assert result.ok is False
    assert result.error == "invalid_arguments"
    
    output = json.loads(result.output)
    assert "destination_id" in output["detail"]

@pytest.mark.asyncio
async def test_request_transfer_invalid_destination():
    """Test transfer with invalid destination_id."""
    args = {"destination_id": "nonexistent"}
    invocation = create_test_invocation()
    deps = {"call_service": MockCallService()}
    
    result = await request_transfer_tool(args, invocation, deps)
    
    assert result.ok is False
    assert result.error == "invalid_destination_id"

Testing Tool Dependencies

Test dependency injection and service interactions:

import pytest
from unittest.mock import AsyncMock, Mock

class MockCallService:
    """Mock for OpenAICallsService."""
    def __init__(self):
        self.refer_called = False
        self.last_call_id = None
        self.last_target_uri = None
        self.should_fail = False
    
    async def refer_call(self, call_id: str, target_uri: str, idempotency_key: str):
        self.refer_called = True
        self.last_call_id = call_id
        self.last_target_uri = target_uri
        
        if self.should_fail:
            raise Exception("Transfer failed")

@pytest.mark.asyncio
async def test_request_transfer_service_failure():
    """Test handling of service failures."""
    args = {"destination_id": "commercial"}
    invocation = create_test_invocation()
    
    # Configure mock to fail
    mock_service = MockCallService()
    mock_service.should_fail = True
    deps = {"call_service": mock_service}
    
    result = await request_transfer_tool(args, invocation, deps)
    
    # Tool should handle the exception and return error result
    assert result.ok is False
    assert result.error == "call_transfer_failed"
    assert mock_service.refer_called  # Service was invoked

Tools must never raise exceptions. Always test that exceptions are caught and converted to error ToolResult objects.

Testing ToolExecutor

The ToolExecutor orchestrates tool execution. Test the complete flow:

import pytest
from src.apps.calls.tools.tool_executor import ToolExecutor
from src.models.openai.openai import ToolInvocation

@pytest.mark.asyncio
async def test_tool_executor_routes_to_tool():
    """Test that executor routes invocations to registered tools."""
    # Create executor and register a test tool
    executor = ToolExecutor()
    
    tool_called = False
    received_args = None
    
    async def test_tool(args, invocation, deps):
        nonlocal tool_called, received_args
        tool_called = True
        received_args = args
        return ToolResult(
            function_call_id=invocation.function_call_id,
            ok=True,
            output='{"status": "success"}'
        )
    
    executor.register("test_tool", test_tool)
    
    # Create invocation
    invocation = ToolInvocation(
        call_id="test-call",
        function_call_id="func-123",
        name="test_tool",
        arguments_json='{"param": "value"}',
        # ... other fields
    )
    
    # Execute
    result = await executor.execute(invocation)
    
    # Verify
    assert tool_called
    assert received_args == {"param": "value"}
    assert result.ok is True

@pytest.mark.asyncio
async def test_tool_executor_handles_unknown_tool():
    """Test executor behavior with unregistered tool."""
    executor = ToolExecutor()
    
    invocation = ToolInvocation(
        call_id="test-call",
        function_call_id="func-123",
        name="nonexistent_tool",
        arguments_json='{}',
        # ... other fields
    )
    
    result = await executor.execute(invocation)
    
    # Should return error result, not raise exception
    assert result.ok is False
    assert result.error == "tool_not_found"
    
    output = json.loads(result.output)
    assert output["tool"] == "nonexistent_tool"

@pytest.mark.asyncio
async def test_tool_executor_handles_invalid_json():
    """Test executor with malformed JSON arguments."""
    executor = ToolExecutor()
    executor.register("test_tool", lambda a, i, d: None)
    
    invocation = ToolInvocation(
        call_id="test-call",
        function_call_id="func-123",
        name="test_tool",
        arguments_json='{invalid json}',  # Malformed
        # ... other fields
    )
    
    result = await executor.execute(invocation)
    
    assert result.ok is False
    assert result.error == "tool_args_parse_error"

These tests verify the patterns from tool_executor.py:45-246:

Argument parsing (lines 100-149)
Tool lookup (lines 64-98)
Error handling (lines 114-149, 185-246)

Testing ToolBuilder

The ToolBuilder generates schemas based on tenant configuration:

import pytest
from src.apps.calls.tools.tool_builder import ToolBuilder
from src.models.instructions.tenant_config import TenantConfig, Features, ReferFeature

def test_tool_builder_creates_refer_tool():
    """Test that builder creates refer tool when feature is enabled."""
    builder = ToolBuilder()
    
    config = TenantConfig(
        tenant_id="test-tenant",
        features=Features(
            refer=ReferFeature(
                enabled=True,
                destinations=[
                    {
                        "destination_id": "sales",
                        "label": "Sales Team",
                        "target_uri": "sip:[email protected]",
                        "enabled": True,
                        "priority": 1,
                    },
                    {
                        "destination_id": "support",
                        "label": "Support",
                        "target_uri": "sip:[email protected]",
                        "enabled": True,
                        "priority": 2,
                    },
                ],
            )
        ),
    )
    
    result = builder.build_tools(config)
    
    # Should create one tool
    assert len(result.tools) == 1
    assert result.tool_choice == "auto"
    
    tool = result.tools[0]
    assert tool.name == "request_transfer"
    assert "destination_id" in tool.parameters["properties"]
    
    # Verify enum contains both destinations
    enum_values = tool.parameters["properties"]["destination_id"]["enum"]
    assert "sales" in enum_values
    assert "support" in enum_values

def test_tool_builder_skips_disabled_feature():
    """Test that builder does not create tool when feature is disabled."""
    builder = ToolBuilder()
    
    config = TenantConfig(
        tenant_id="test-tenant",
        features=Features(
            refer=ReferFeature(enabled=False, destinations=[])
        ),
    )
    
    result = builder.build_tools(config)
    
    # No tools should be created
    assert len(result.tools) == 0
    assert result.tool_choice == "none"

def test_tool_builder_filters_disabled_destinations():
    """Test that builder only includes enabled destinations."""
    builder = ToolBuilder()
    
    config = TenantConfig(
        tenant_id="test-tenant",
        features=Features(
            refer=ReferFeature(
                enabled=True,
                destinations=[
                    {"destination_id": "sales", "enabled": True, "priority": 1},
                    {"destination_id": "support", "enabled": False, "priority": 2},
                ],
            )
        ),
    )
    
    result = builder.build_tools(config)
    tool = result.tools[0]
    
    enum_values = tool.parameters["properties"]["destination_id"]["enum"]
    assert "sales" in enum_values
    assert "support" not in enum_values  # Disabled destination excluded

These tests verify the logic from tool_builder.py:84-186:

Feature enablement checks (lines 85-92)
Destination filtering (lines 94-102)
Schema generation (lines 146-186)

Integration Testing

Testing the Complete Flow

Test the full call handling flow with all components:

import pytest
from httpx import AsyncClient
from src.main import app  # Your FastAPI app

@pytest.mark.asyncio
async def test_call_with_transfer_request():
    """Test complete flow of call with transfer tool invocation."""
    async with AsyncClient(app=app, base_url="http://test") as client:
        # Simulate OpenAI function call event
        payload = {
            "type": "function_call_output.item.created",
            "session_id": "session-123",
            "item": {
                "id": "item-456",
                "call_id": "call-789",
                "function_call_id": "fc-012",
                "name": "request_transfer",
                "arguments": '{"destination_id": "commercial"}',
            },
        }
        
        response = await client.post("/api/calls/events", json=payload)
        
        assert response.status_code == 200
        # Verify that transfer was initiated
        # Check logs, database state, etc.

Using Test Containers

For integration tests requiring real services:

import pytest
from testcontainers.mongodb import MongoDbContainer
from pymongo import MongoClient

@pytest.fixture(scope="module")
def mongodb_container():
    """Provide a MongoDB test container."""
    with MongoDbContainer("mongo:7.0") as container:
        yield container

@pytest.fixture
def mongodb_client(mongodb_container):
    """Provide a MongoDB client connected to test container."""
    connection_url = mongodb_container.get_connection_url()
    client = MongoClient(connection_url)
    yield client
    client.close()

@pytest.mark.asyncio
async def test_call_persistence(mongodb_client):
    """Test that call data is persisted correctly."""
    db = mongodb_client["test_db"]
    collection = db["calls"]
    
    # Test your persistence logic
    # ...

Use testcontainers for integration tests that need real databases or services. This ensures tests are reproducible and don’t depend on external state.

Mocking External Services

Mocking HTTP Calls

Use httpx mocking for testing external API interactions:

import pytest
import respx
from httpx import Response

@pytest.mark.asyncio
@respx.mock
async def test_openai_api_call():
    """Test interaction with OpenAI API."""
    # Mock the API endpoint
    respx.post("https://api.openai.com/v1/realtime/calls/call-123/refer").mock(
        return_value=Response(200, json={"status": "success"})
    )
    
    # Your code that calls the API
    result = await call_service.refer_call(
        call_id="call-123",
        target_uri="sip:[email protected]",
        idempotency_key="key-123",
    )
    
    assert result.status == "success"

Using pytest-mock

Mock dependencies with pytest-mock:

import pytest

@pytest.mark.asyncio
async def test_tool_with_mocked_dependency(mocker):
    """Test tool with mocked service dependency."""
    # Mock the service method
    mock_refer = mocker.patch(
        "src.infra.openai.calls_http.OpenAICallsService.refer_call",
        return_value=None,  # Async mock
    )
    mock_refer.return_value = None
    
    # Execute tool
    result = await request_transfer_tool(args, invocation, deps)
    
    # Verify mock was called
    mock_refer.assert_called_once_with(
        call_id="test-call",
        target_uri="sip:[email protected]",
        idempotency_key=mocker.ANY,
    )

Testing Time-Dependent Logic

Use freezegun to control time in tests:

import pytest
from freezegun import freeze_time
from datetime import datetime

@freeze_time("2024-01-15 14:30:00")
@pytest.mark.asyncio
async def test_business_hours_check():
    """Test logic that depends on current time."""
    # Time is frozen at 2:30 PM on Jan 15, 2024
    result = is_business_hours()
    assert result is True

@freeze_time("2024-01-15 22:00:00")
@pytest.mark.asyncio
async def test_after_hours_behavior():
    """Test behavior outside business hours."""
    result = is_business_hours()
    assert result is False

Coverage Reporting

Generate coverage reports with pytest-cov:

# Run tests with coverage
pytest --cov=src --cov-report=html --cov-report=term

# View HTML report
open htmlcov/index.html

Add to pyproject.toml for coverage configuration:

[tool.coverage.run]
source = ["src"]
omit = [
    "*/tests/*",
    "*/migrations/*",
]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "def __repr__",
    "raise AssertionError",
    "raise NotImplementedError",
    "if __name__ == .__main__.:",
]

Test Organization

Organize tests to mirror source structure:

tests/
├── apps/
│   └── calls/
│       ├── tools/
│       │   ├── definitions/
│       │   │   ├── test_request_transfer.py
│       │   │   └── test_your_tool.py
│       │   ├── test_tool_builder.py
│       │   └── test_tool_executor.py
│       └── test_call_handler.py
├── infra/
│   └── openai/
│       └── test_calls_http.py
└── models/
    └── test_tenant_config.py

Fixtures and Helpers

Create reusable fixtures in conftest.py:

# tests/conftest.py
import pytest
from src.models.instructions.tenant_config import TenantConfig
from src.models.openai.openai import ToolInvocation

@pytest.fixture
def sample_tenant_config():
    """Provide a standard test tenant configuration."""
    return TenantConfig(
        tenant_id="test-tenant",
        features=Features(
            # ... configure features
        ),
    )

@pytest.fixture
def create_invocation(sample_tenant_config):
    """Factory fixture for creating ToolInvocation objects."""
    def _create(
        call_id="test-call",
        function_call_id="test-func",
        name="test_tool",
        arguments_json="{}",
    ):
        return ToolInvocation(
            call_id=call_id,
            function_call_id=function_call_id,
            response_id="resp-123",
            item_id="item-456",
            name=name,
            arguments_json=arguments_json,
            tenant_config=sample_tenant_config,
        )
    return _create

# Usage in tests
@pytest.mark.asyncio
async def test_something(create_invocation):
    invocation = create_invocation(name="my_tool")
    # ...

Create fixtures for common objects

Define fixtures for frequently used test data like configs and invocations.

Use factory fixtures for variations

Create factory fixtures that return functions to generate test objects with custom parameters.

Share fixtures via conftest.py

Place shared fixtures in conftest.py to make them available across all tests.

Running Tests

Run All Tests

pytest

Run Specific Test File

pytest tests/apps/calls/tools/test_tool_executor.py

Run Tests Matching Pattern

pytest -k "transfer"

Run with Verbose Output

pytest -v

Run with Coverage

pytest --cov=src --cov-report=term-missing

Run in Parallel (with pytest-xdist)

pytest -n auto

Best Practices

Test Public Interfaces

Focus on testing the public API of modules, not internal implementation details.

Use Descriptive Test Names

Name tests with pattern test_<component>_<scenario>_<expected_outcome>

Arrange-Act-Assert

Structure tests with clear arrange, act, and assert phases.

Test Error Paths

Always test both success and failure scenarios.

Isolate Tests

Each test should be independent and not rely on other tests.

Mock External Dependencies

Mock HTTP calls, databases, and external services to keep tests fast and reliable.

Use Fixtures for Setup

Extract common setup code into fixtures to keep tests DRY.

Async tests must be marked with @pytest.mark.asyncio. If you see “coroutine was never awaited” warnings, you likely forgot this decorator.

Next Steps

Review test configuration in pyproject.toml:51-55
See Custom Tools for tool implementation patterns to test
Check Prompt Engineering for testing prompt effectiveness

Development

Operations

Test Configuration

Test Dependencies

Testing Tools

Basic Tool Test Structure

Testing Error Conditions

Testing Tool Dependencies

Testing ToolExecutor

Testing ToolBuilder

Integration Testing

Testing the Complete Flow

Using Test Containers

Mocking External Services

Mocking HTTP Calls

Using pytest-mock

Testing Time-Dependent Logic

Coverage Reporting

Test Organization

Fixtures and Helpers

Running Tests

Run All Tests

Run Specific Test File

Run Tests Matching Pattern

Run with Verbose Output

Run with Coverage

Run in Parallel (with pytest-xdist)

Best Practices

Next Steps

Build docs developers (and LLMs) love

Development

Operations

​Test Configuration

​Test Dependencies

​Testing Tools

​Basic Tool Test Structure

​Testing Error Conditions

​Testing Tool Dependencies

​Testing ToolExecutor

​Testing ToolBuilder

​Integration Testing

​Testing the Complete Flow

​Using Test Containers

​Mocking External Services

​Mocking HTTP Calls

​Using pytest-mock

​Testing Time-Dependent Logic

​Coverage Reporting

​Test Organization

​Fixtures and Helpers

​Running Tests

​Run All Tests

​Run Specific Test File

​Run Tests Matching Pattern

​Run with Verbose Output

​Run with Coverage

​Run in Parallel (with pytest-xdist)

​Best Practices

​Next Steps

Build docs developers (and LLMs) love

Test Configuration

Test Dependencies

Testing Tools

Basic Tool Test Structure

Testing Error Conditions

Testing Tool Dependencies

Testing ToolExecutor

Testing ToolBuilder

Integration Testing

Testing the Complete Flow

Using Test Containers

Mocking External Services

Mocking HTTP Calls

Using pytest-mock

Testing Time-Dependent Logic

Coverage Reporting

Test Organization

Fixtures and Helpers

Running Tests

Run All Tests

Run Specific Test File

Run Tests Matching Pattern

Run with Verbose Output

Run with Coverage

Run in Parallel (with pytest-xdist)

Best Practices

Next Steps