Skip to main content

Overview

The web chat interface lets you simulate customer interactions in real-time through a browser-based conversation. You play the role of the business (as an operator or IVR system), while the AI agent behaves as a customer trying to complete a specific task. This mode is perfect for:
  • Rapid prototyping of conversation flows
  • Testing different business scenarios without placing real calls
  • Debugging customer interactions before phone deployment
  • Training on how customers might approach your business

How It Works

The web chat system uses a three-part architecture:
  1. OpenAI GPT-4o-mini generates conversational responses based on your business description and scenario
  2. ElevenLabs TTS converts agent text responses into natural-sounding speech
  3. Flask session management maintains conversation context across messages

Technical Flow

Starting a Conversation

1

Provide Business Context

Navigate to the home page at http://localhost:5000 and describe the business you want to test.Example business descriptions:
  • “A dental clinic with online booking, insurance verification, and reminder calls”
  • “A pizza delivery service that takes orders, provides ETA, and handles complaints”
  • “A bank’s customer service line for balance inquiries, fraud reports, and card activation”
The more detailed your description, the more realistic the agent’s behavior will be.
2

Define the Scenario (Optional)

Specify what the caller wants to accomplish. If left blank, the system uses the default scenario:
DEFAULT_PHONE_SCENARIO = (
    "check availability, complete a typical customer task, and avoid speaking to a human if possible"
)
Custom scenario examples:
  • “check appointment availability for next Tuesday”
  • “order a large pepperoni pizza for delivery”
  • “report a fraudulent charge on my credit card”
3

Initialize the Session

Click Start conversation to initialize the chat session. This triggers the /api/context endpoint:
const res = await fetch('/api/context', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
        description: descriptionEl.value.trim(),
        scenario: scenarioEl.value.trim()
    })
});
The backend creates a session and builds the agent’s system prompt:
@app.route("/api/context", methods=["POST"])
def set_context():
    data = request.get_json() or {}
    description = (data.get("description") or "").strip()
    scenario = normalize_scenario(data.get("scenario"))
    if not description:
        return jsonify({"error": "No description provided"}), 400
    session["business_description"] = description
    session["scenario"] = scenario
    session["messages"] = []
    return jsonify({"ok": True, "scenario": scenario})
The conversation doesn’t begin until you send your first message. The agent waits for you to initiate, simulating how a real phone system would answer a call.

Message Flow

Once the conversation is active, you interact with the agent through a turn-based chat interface.

Sending Messages

Type your response as if you are the business answering the phone. Common examples:
  • “Thank you for calling ABC Dental. How can I help you today?”
  • “What size pizza would you like to order?”
  • “I can help with that. Can you provide your account number?”

Backend Processing

When you send a message, the /api/chat endpoint processes it:
@app.route("/api/chat", methods=["POST"])
def chat():
    business_description = get_session_value("business_description")
    scenario = normalize_scenario(get_session_value("scenario"))
    if not business_description:
        return jsonify({"error": "Set a business description first (use /api/context)"}), 400

    data = request.get_json() or {}
    user_message = (data.get("message") or "").strip()
    if not user_message:
        return jsonify({"error": "No message provided"}), 400

    messages = session.get("messages", [])
    system_prompt = build_caller_prompt(business_description, scenario)
    if not messages:
        messages = [{"role": "system", "content": system_prompt}]
    messages.append({"role": "user", "content": user_message})

System Prompt Construction

The agent receives a carefully crafted prompt that defines its behavior:
def build_caller_prompt(business_description: str, scenario: str):
    return (
        "You are simulating a real customer contacting the business below. "
        "The other side is the company, an operator, or an IVR. "
        "Stay in character as the caller/customer, try to complete the task, "
        "and avoid escalating to a human unless the flow requires it.\n\n"
        f"Business description:\n{business_description}\n\n"
        f"Caller goal:\n{scenario}\n\n"
        "Speak naturally, ask one thing at a time, and keep each response concise."
    )
The system prompt instructs the agent to “avoid escalating to a human unless the flow requires it” — this helps test self-service capabilities and IVR effectiveness.

OpenAI API Integration

The backend calls OpenAI’s Chat Completions API:
from openai import OpenAI
client = OpenAI(api_key=openai_key)
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)
agent_text = (completion.choices[0].message.content or "").strip()
The agent’s response is appended to the conversation history:
messages.append({"role": "assistant", "content": agent_text})
session["messages"] = messages

Audio Playback

Every agent response is automatically converted to speech using ElevenLabs TTS.

Text-to-Speech Conversion

The backend generates audio using the text_to_speech_audio function:
def text_to_speech_audio(text: str):
    """Return MP3 bytes from ElevenLabs TTS."""
    client = get_elevenlabs_client()
    if not client:
        return None
    audio_stream = client.text_to_speech.convert(
        text=text[:1500],
        voice_id="JBFqnCBsd6RMkjVDRZzb",
        model_id="eleven_turbo_v2_5",
    )
    return collect_audio_bytes(audio_stream)
Text is truncated to 1500 characters to stay within ElevenLabs limits and ensure fast response times.

Response Format

The /api/chat endpoint returns both text and audio:
audio_base64 = None
try:
    audio_bytes = text_to_speech_audio(agent_text)
    if audio_bytes:
        audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")
except Exception:
    pass

return jsonify({"text": agent_text, "audio_base64": audio_base64})

Frontend Audio Rendering

The web interface receives the response and plays the audio automatically:
const data = await res.json();
appendMessage('agent', data.text, data.audio_base64);
if (data.audio_base64) playBase64Audio(data.audio_base64);

function playBase64Audio(base64) {
    if (!base64) return;
    const audio = new Audio('data:audio/mpeg;base64,' + base64);
    audio.play();
}
Each message bubble also includes an embedded audio player for replay:
if (audioBase64 && role === 'agent') {
    const aud = document.createElement('audio');
    aud.controls = true;
    aud.src = 'data:audio/mpeg;base64,' + audioBase64;
    div.appendChild(aud);
}
Audio plays automatically when received, but you can replay any message using the embedded controls in the message bubble.

Conversation UI Elements

The chat interface displays messages in a turn-based format:

Message Display

<div class="messages" id="messages"></div>
Messages are styled differently based on the speaker:
.msg.user { background: #e3f2fd; margin-left: 0; margin-right: auto; }
.msg.agent { background: #e8f5e9; margin-left: auto; margin-right: 0; }
  • User (You as the business): Blue background, left-aligned
  • Agent (AI customer): Green background, right-aligned

Message Structure

Each message shows the role and content:
function appendMessage(role, text, audioBase64) {
    const div = document.createElement('div');
    div.className = 'msg ' + role;
    div.innerHTML = '<div class="role">' + 
        (role === 'user' ? 'You (company)' : 'Agent') + 
        '</div><div>' + escapeHtml(text) + '</div>';
    messagesEl.appendChild(div);
    messagesEl.scrollTop = messagesEl.scrollHeight;
}

Error Handling

If OPENAI_API_KEY is not set in your .env file, the agent will return a placeholder message instead of generating intelligent responses.
The backend includes graceful fallbacks:
openai_key = os.getenv("OPENAI_API_KEY")
if not openai_key:
    agent_text = "I'd like to know more about your business. (Set OPENAI_API_KEY in .env for full conversation.)"
else:
    try:
        # Call OpenAI API
    except Exception as e:
        agent_text = f"I had trouble responding: {e}"

Common Errors

Error MessageCauseSolution
Set a business description first (use /api/context)Attempting to chat without initializing contextClick “Start conversation” first
No message providedSent empty messageType a message before sending
Request failedNetwork or server errorCheck console logs and server status

API Reference

POST /api/context

Initializes a new conversation session. Request body:
{
  "description": "A dental clinic with online booking",
  "scenario": "check appointment availability"
}
Response:
{
  "ok": true,
  "scenario": "check appointment availability"
}

POST /api/chat

Sends a message and receives the agent’s response. Request body:
{
  "message": "Thank you for calling. How can I help you today?"
}
Response:
{
  "text": "Hi, I'd like to check if you have any appointments available next Tuesday afternoon?",
  "audio_base64": "//uQxAA...base64-encoded-mp3..."
}

Best Practices

1

Start with Clear Context

Provide detailed business descriptions including:
  • What the business does
  • Available services or products
  • Common customer requests
  • Any special policies or procedures
2

Use Realistic Scenarios

Test scenarios that match real customer behaviors:
  • Simple information requests
  • Transaction completions
  • Problem resolution
  • Edge cases and difficult customers
3

Play the Role Authentically

Respond as your actual business would:
  • Use your real greeting scripts
  • Follow your standard procedures
  • Apply your actual policies
  • Use industry-appropriate language
4

Test Multiple Paths

Click “Start conversation” again to reset and test different conversation flows without reloading the page.

Session Management

Conversation state is stored in Flask sessions:
session["business_description"] = description
session["scenario"] = scenario
session["messages"] = []  # Conversation history
The session persists across requests using a session cookie:
app.secret_key = os.getenv("FLASK_SECRET_KEY", "dev-secret-change-in-production")
Sessions are stored server-side in memory by default. If you restart the Flask server, all conversation history is lost.

Next Steps

After testing conversation flows in the web chat:
  • Move to real phone call testing to validate the same scenarios over actual phone lines
  • Use the same business description and scenario for consistent testing across both modes
  • Compare agent behavior between text chat and voice calls

Build docs developers (and LLMs) love