Outbound Phone Calls - AI Voice Testing Platform

Overview

The outbound phone call feature lets you test business phone systems with real voice calls. The same AI agent that powers the web chat can place actual phone calls to any number, behaving as a customer trying to complete a specific task. This mode enables:

End-to-end testing of live phone systems
IVR flow validation with real voice input
Load testing customer service lines
Quality assurance before launching new phone flows
Realistic testing of voice recognition and call routing

How It Works

The phone call system uses a four-part integration:

Flask backend coordinates the call request with business context
ElevenLabs Conversational AI provides the voice agent with dynamic prompts
Twilio handles the actual phone call infrastructure
OpenAI (via ElevenLabs) powers the conversational intelligence

Architecture

ElevenLabs + Twilio Integration

This platform uses ElevenLabs’ hosted Conversational AI service with Twilio backend for outbound calling.

Required Configuration

Three environment variables control phone call functionality:

ELEVENLABS_API_KEY=your_api_key
ELEVENLABS_AGENT_ID=your_agent_id
ELEVENLABS_AGENT_PHONE_NUMBER_ID=your_phone_number_id

The backend validates these before placing calls:

missing_env = [
    name
    for name in ("ELEVENLABS_API_KEY", "ELEVENLABS_AGENT_ID", "ELEVENLABS_AGENT_PHONE_NUMBER_ID")
    if not os.getenv(name)
]
if missing_env:
    return jsonify({"error": f"Missing required environment variables: {', '.join(missing_env)}"}), 500

All three variables must be set or the /api/call endpoint will return a 500 error with details about which variables are missing.

ElevenLabs Setup

Create or Select a Conversational AI Agent

In the ElevenLabs dashboard:

Navigate to Conversational AI
Create a new agent or select an existing one
Copy the Agent ID
Set ELEVENLABS_AGENT_ID in your .env file

The agent’s base configuration is overridden at call time by the platform, so you can use a generic agent for all tests.

Enable Prompt Overrides

For the platform to inject business descriptions and scenarios dynamically:

Open your agent settings
Enable “Allow prompt overrides”
Save the configuration

This allows the platform to send custom prompts through conversation_config_override.

Connect a Twilio Phone Number

In ElevenLabs:

Go to Phone Numbers
Import or connect a Twilio-backed phone number
Copy the Phone Number ID (not the actual phone number)
Set ELEVENLABS_AGENT_PHONE_NUMBER_ID in your .env

The phone number ID is the internal identifier ElevenLabs uses to route calls through Twilio.

API Endpoint

The platform uses ElevenLabs’ outbound call endpoint:

ELEVENLABS_API_BASE = os.getenv("ELEVENLABS_API_BASE", "https://api.elevenlabs.io").rstrip("/")

def elevenlabs_post(path: str, payload: dict):
    api_key = os.getenv("ELEVENLABS_API_KEY")
    if not api_key:
        raise RuntimeError("ELEVENLABS_API_KEY not set in .env")

    request_body = json.dumps(payload).encode("utf-8")
    api_request = Request(
        f"{ELEVENLABS_API_BASE}{path}",
        data=request_body,
        headers={
            "Content-Type": "application/json",
            "xi-api-key": api_key,
        },
        method="POST",
    )

    try:
        with urlopen(api_request, timeout=30) as response:
            raw_body = response.read().decode("utf-8")
            return json.loads(raw_body) if raw_body else {}
    except HTTPError as exc:
        error_body = exc.read().decode("utf-8", errors="replace")
        try:
            parsed_body = json.loads(error_body)
        except json.JSONDecodeError:
            parsed_body = {"error": error_body or exc.reason}
        error_message = parsed_body.get("detail") or parsed_body.get("error") or exc.reason
        raise RuntimeError(f"ElevenLabs API error ({exc.code}): {error_message}") from exc
    except URLError as exc:
        raise RuntimeError(f"Could not reach ElevenLabs: {exc.reason}") from exc

The function includes comprehensive error handling, capturing both HTTP errors and network failures with detailed messages.

Placing Calls

To place an outbound phone call, you need a business description, optional scenario, and a valid phone number.

Phone Number Format

Phone numbers must be in E.164 format:

Starts with +
Followed by country code
Then the subscriber number
No spaces, dashes, or parentheses

Valid examples:

+15551234567 (US)
+442071234567 (UK)
+61212345678 (Australia)

Invalid examples:

5551234567 (missing +)
+1 555 123 4567 (contains spaces)
(555) 123-4567 (wrong format)

The backend validates phone numbers using regex:

E164_PATTERN = re.compile(r"^\+[1-9]\d{7,14}$")

def validate_phone_number(phone_number: str):
    return bool(E164_PATTERN.match((phone_number or "").strip()))

If the phone number doesn’t match E.164 format, the API returns a 400 error: "Phone number must be in E.164 format, for example +15551234567."

Using the Web Interface

Enter Business Context

On the home page, fill in the business description and optional scenario fields (same as for web chat).Example:

Business description: “A dental clinic with online booking, insurance verification, and reminder calls”
Scenario: “check appointment availability for next Tuesday”

Enter Phone Number

In the “Phone test” section, enter the destination number in E.164 format:

<input type="tel" id="phone-number" placeholder="+15551234567" autocomplete="tel">

The UI shows a hint about the required format:

<div class="hint">Use E.164 format, for example <code>+15551234567</code>.</div>

Start the Call

Click Start call. The button disables and changes to “Calling…” while the request is processed:

callBtn.disabled = true;
callBtn.textContent = 'Calling...';

const res = await fetch('/api/call', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
        business_description: description,
        scenario,
        to_number: toNumber
    })
});

View Call Status

If successful, the UI displays call details:

const details = [];
if (data.call_sid) details.push('Call SID: ' + data.call_sid);
if (data.conversation_id) details.push('Conversation ID: ' + data.conversation_id);
callStatus.textContent = details.length
    ? data.message + ' ' + details.join(' | ')
    : data.message || 'Call initiated.';

Example output:

Call initiated. Call SID: CA1234567890abcdef | Conversation ID: conv_abc123

Call Scenarios

Scenarios define what the AI agent is trying to accomplish during the call. They shape the agent’s behavior and goals.

Default Scenario

If no scenario is provided, the system uses:

DEFAULT_PHONE_SCENARIO = (
    "check availability, complete a typical customer task, and avoid speaking to a human if possible"
)

def normalize_scenario(value: str):
    scenario = (value or "").strip()
    return scenario or DEFAULT_PHONE_SCENARIO

Custom Scenarios

Scenarios should be written from the caller’s perspective:

Industry	Scenario Example
Healthcare	”schedule a cleaning appointment for next week”
E-commerce	”track my order and get an estimated delivery date”
Banking	”check my account balance and recent transactions”
Restaurant	”make a reservation for 4 people on Saturday at 7pm”
Technical Support	”reset my password and verify my account is working”

Building the Caller Prompt

The backend constructs a detailed prompt that instructs the agent on how to behave:

def build_caller_prompt(business_description: str, scenario: str):
    return (
        "You are simulating a real customer contacting the business below. "
        "The other side is the company, an operator, or an IVR. "
        "Stay in character as the caller/customer, try to complete the task, "
        "and avoid escalating to a human unless the flow requires it.\n\n"
        f"Business description:\n{business_description}\n\n"
        f"Caller goal:\n{scenario}\n\n"
        "Speak naturally, ask one thing at a time, and keep each response concise."
    )

First Message

The agent always starts the conversation with a natural greeting based on the scenario:

def build_first_message(scenario: str):
    return f"Hi, I'm calling because I'd like to {scenario.rstrip('.')}."

Examples:

Scenario: “check appointment availability”
First message: “Hi, I’m calling because I’d like to check appointment availability.”

Write scenarios as verb phrases (“check availability”, “place an order”) rather than complete sentences for the most natural first message.

API Call Flow

When you trigger a phone call, the backend orchestrates the request to ElevenLabs.

Request Handler

The /api/call endpoint processes the call request:

@app.route("/api/call", methods=["POST"])
def start_call():
    """Start an outbound phone call through ElevenLabs + Twilio."""
    data = request.get_json() or {}
    to_number = (data.get("to_number") or "").strip()
    business_description = (data.get("business_description") or get_session_value("business_description")).strip()
    scenario = normalize_scenario(data.get("scenario") or get_session_value("scenario"))

    if not business_description:
        return jsonify({"error": "Provide a business description first."}), 400
    if not validate_phone_number(to_number):
        return jsonify({"error": "Phone number must be in E.164 format, for example +15551234567."}), 400

Building the Payload

The backend constructs the ElevenLabs API payload:

payload = {
    "agent_id": os.getenv("ELEVENLABS_AGENT_ID"),
    "agent_phone_number_id": os.getenv("ELEVENLABS_AGENT_PHONE_NUMBER_ID"),
    "to_number": to_number,
    "conversation_initiation_client_data": build_conversation_initiation_data(
        business_description,
        scenario,
    ),
}

The conversation_initiation_client_data contains the prompt override:

def build_conversation_initiation_data(business_description: str, scenario: str):
    return {
        "conversation_config_override": {
            "agent": {
                "prompt": {
                    "prompt": build_caller_prompt(business_description, scenario),
                },
                "first_message": build_first_message(scenario),
            }
        }
    }

The conversation_config_override structure tells ElevenLabs to use your custom prompt and first message instead of the agent’s default configuration.

Sending the Request

The platform sends the payload to ElevenLabs:

try:
    call_response = elevenlabs_post("/v1/convai/twilio/outbound-call", payload)
except Exception as exc:
    return jsonify({"error": str(exc)}), 502

Response Format

The backend returns a normalized response:

return jsonify(
    {
        "ok": True,
        "status": call_response.get("status", "initiated"),
        "message": "Call initiated.",
        "to_number": to_number,
        "scenario": scenario,
        "call_sid": call_response.get("call_sid") or call_response.get("callSid"),
        "conversation_id": call_response.get("conversation_id") or call_response.get("conversationId"),
        "raw": call_response,
    }
)

Example response:

{
  "ok": true,
  "status": "initiated",
  "message": "Call initiated.",
  "to_number": "+15551234567",
  "scenario": "check appointment availability",
  "call_sid": "CA1234567890abcdef1234567890abcdef",
  "conversation_id": "conv_abc123xyz456",
  "raw": { /* full ElevenLabs response */ }
}

The call_sid is Twilio’s unique identifier for the call, while conversation_id is ElevenLabs’ internal ID for tracking the AI conversation.

Error Handling

The platform includes comprehensive error handling for common failure scenarios.

Validation Errors

{
  "error": "Provide a business description first."
}

API Errors

ElevenLabs API errors are caught and forwarded with details:

except HTTPError as exc:
    error_body = exc.read().decode("utf-8", errors="replace")
    try:
        parsed_body = json.loads(error_body)
    except json.JSONDecodeError:
        parsed_body = {"error": error_body or exc.reason}
    error_message = parsed_body.get("detail") or parsed_body.get("error") or exc.reason
    raise RuntimeError(f"ElevenLabs API error ({exc.code}): {error_message}") from exc

Common API errors:

HTTP Code	Error	Solution
401	Invalid API key	Check `ELEVENLABS_API_KEY` in `.env`
404	Agent not found	Verify `ELEVENLABS_AGENT_ID` is correct
404	Phone number not found	Verify `ELEVENLABS_AGENT_PHONE_NUMBER_ID` is correct
429	Rate limit exceeded	Wait and retry, or upgrade your ElevenLabs plan
500	ElevenLabs server error	Check ElevenLabs status page

Network Errors

except URLError as exc:
    raise RuntimeError(f"Could not reach ElevenLabs: {exc.reason}") from exc

The API request has a 30-second timeout. If ElevenLabs doesn’t respond within this time, a timeout error is raised.

Testing Phone Flows

Use these strategies to effectively test your phone systems:

Happy Path Testing

Define Success Criteria

Before placing calls, document what a successful interaction looks like:

Does the agent reach the correct department?
Can it complete the task without human intervention?
Does the IVR respond correctly to voice input?

Start with Simple Scenarios

Test basic interactions first:

{
  "business_description": "A pizza restaurant that takes phone orders",
  "scenario": "order a large pepperoni pizza for delivery",
  "to_number": "+15551234567"
}

Increase Complexity

Add edge cases and complex requests:

Multiple items or modifications
Questions that require transfers
Requests for information not available in IVR

IVR Flow Validation

Test automated phone systems systematically:

Menu Navigation: Can the agent navigate multi-level IVR menus?
Voice Recognition: Does the IVR correctly interpret the agent’s speech?
Fallback Handling: What happens when the agent says unexpected things?
Hold and Transfer: Does the system handle transfers gracefully?

Load Testing

Making many simultaneous calls may incur significant costs from both ElevenLabs and Twilio. Always check pricing before load testing.

To test call volume capacity:

import requests
import threading

def place_test_call():
    response = requests.post('http://localhost:5000/api/call', json={
        'business_description': 'A customer service center',
        'scenario': 'check account status',
        'to_number': '+15551234567'
    })
    print(f"Call status: {response.json().get('status')}")

# Place 10 concurrent calls
threads = [threading.Thread(target=place_test_call) for _ in range(10)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

Monitoring and Analytics

After placing calls, use the returned IDs to track results.

Call Tracking

The response includes two tracking identifiers:

"call_sid": call_response.get("call_sid") or call_response.get("callSid"),
"conversation_id": call_response.get("conversation_id") or call_response.get("conversationId"),

call_sid: Use in Twilio’s dashboard to view call logs, duration, cost, and recordings
conversation_id: Use in ElevenLabs’ dashboard to view conversation transcripts and agent behavior

Twilio Dashboard

Access call details at:

https://console.twilio.com/us1/monitor/logs/calls/{call_sid}

View:

Call duration
Cost
Status (completed, busy, no-answer, failed)
Recordings (if enabled)

ElevenLabs Dashboard

Access conversation details in the ElevenLabs Conversational AI section using the conversation_id to view:

Full conversation transcript
Agent responses and timing
Any errors or fallbacks
Audio recordings

Best Practices

Test in Web Chat First

Before placing phone calls, validate your business description and scenarios in the web chat interface. This helps refine prompts without incurring call costs.

Use Test Numbers

Start with test phone numbers you control to verify the agent’s behavior before testing production systems.

Document Scenarios

Keep a library of tested scenarios with success criteria:

{
  "name": "Appointment Booking",
  "description": "A dental clinic with online booking",
  "scenario": "schedule a teeth cleaning for next Tuesday afternoon",
  "expected_outcome": "Agent successfully books appointment or gets wait time"
}

Monitor Costs

Each call incurs costs from:

ElevenLabs (per minute of AI conversation)
Twilio (per minute of phone call)

Set up billing alerts in both platforms.

Handle Failures Gracefully

Not all calls will succeed. Common issues:

Busy signals
Voicemail
Call screening
Network failures

Build retry logic with exponential backoff for automated testing.

API Reference

POST `/api/call`

Initiates an outbound phone call with AI agent. Request body:

{
  "business_description": "A dental clinic with online booking and insurance verification",
  "scenario": "check appointment availability for next Tuesday",
  "to_number": "+15551234567"
}

Success response (200):

{
  "ok": true,
  "status": "initiated",
  "message": "Call initiated.",
  "to_number": "+15551234567",
  "scenario": "check appointment availability for next Tuesday",
  "call_sid": "CA1234567890abcdef1234567890abcdef",
  "conversation_id": "conv_abc123xyz456",
  "raw": {
    "status": "initiated",
    "call_sid": "CA1234567890abcdef1234567890abcdef",
    "conversation_id": "conv_abc123xyz456"
  }
}

Error responses:

{
  "error": "Provide a business description first."
}

Comparing Chat vs Phone

Feature	Web Chat	Phone Calls
Speed	Instant responses	Real-time with latency
Cost	OpenAI + ElevenLabs TTS	OpenAI + ElevenLabs + Twilio
Testing	Interactive, immediate feedback	Async, requires monitoring
Realism	Text-based simulation	Actual voice interaction
IVR Testing	Cannot test	Full IVR validation
Voice Quality	Playback only	Live voice recognition
Best For	Rapid prototyping, conversation flow	End-to-end validation, production testing

Next Steps

Review web chat testing for pre-call scenario validation
Set up monitoring dashboards in Twilio and ElevenLabs
Build automated test suites for regression testing
Document conversation flows and success criteria
Integrate call testing into your CI/CD pipeline

Get Started

Testing Modes

Configuration

​Overview

​How It Works

​Architecture

​ElevenLabs + Twilio Integration

​Required Configuration

​ElevenLabs Setup

​API Endpoint

​Placing Calls

​Phone Number Format

​Using the Web Interface

​Call Scenarios

​Default Scenario

​Custom Scenarios

​Building the Caller Prompt

​First Message

​API Call Flow

​Request Handler

​Building the Payload

​Sending the Request

​Response Format

​Error Handling

​Validation Errors

​API Errors

​Network Errors

​Testing Phone Flows

​Happy Path Testing

​IVR Flow Validation

​Load Testing

​Monitoring and Analytics

​Call Tracking

​Twilio Dashboard

​ElevenLabs Dashboard

​Best Practices

​API Reference

​POST /api/call

​Comparing Chat vs Phone

​Next Steps

Build docs developers (and LLMs) love

Overview

How It Works

Architecture

ElevenLabs + Twilio Integration

Required Configuration

ElevenLabs Setup

API Endpoint

Placing Calls

Phone Number Format

Using the Web Interface

Call Scenarios

Default Scenario

Custom Scenarios

Building the Caller Prompt

First Message

API Call Flow

Request Handler

Building the Payload

Sending the Request

Response Format

Error Handling

Validation Errors

API Errors

Network Errors

Testing Phone Flows

Happy Path Testing

IVR Flow Validation

Load Testing

Monitoring and Analytics

Call Tracking

Twilio Dashboard

ElevenLabs Dashboard

Best Practices

API Reference

POST `/api/call`

Comparing Chat vs Phone

Next Steps