Web Chat Interface

Overview

The web chat interface lets you simulate customer interactions in real-time through a browser-based conversation. You play the role of the business (as an operator or IVR system), while the AI agent behaves as a customer trying to complete a specific task. This mode is perfect for:

Rapid prototyping of conversation flows
Testing different business scenarios without placing real calls
Debugging customer interactions before phone deployment
Training on how customers might approach your business

How It Works

The web chat system uses a three-part architecture:

OpenAI GPT-4o-mini generates conversational responses based on your business description and scenario
ElevenLabs TTS converts agent text responses into natural-sounding speech
Flask session management maintains conversation context across messages

Technical Flow

Starting a Conversation

Provide Business Context

Navigate to the home page at http://localhost:5000 and describe the business you want to test.Example business descriptions:

“A dental clinic with online booking, insurance verification, and reminder calls”
“A pizza delivery service that takes orders, provides ETA, and handles complaints”
“A bank’s customer service line for balance inquiries, fraud reports, and card activation”

The more detailed your description, the more realistic the agent’s behavior will be.

Define the Scenario (Optional)

Specify what the caller wants to accomplish. If left blank, the system uses the default scenario:

DEFAULT_PHONE_SCENARIO = (
    "check availability, complete a typical customer task, and avoid speaking to a human if possible"
)

Custom scenario examples:

“check appointment availability for next Tuesday”
“order a large pepperoni pizza for delivery”
“report a fraudulent charge on my credit card”

Initialize the Session

Click Start conversation to initialize the chat session. This triggers the /api/context endpoint:

const res = await fetch('/api/context', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
        description: descriptionEl.value.trim(),
        scenario: scenarioEl.value.trim()
    })
});

The backend creates a session and builds the agent’s system prompt:

@app.route("/api/context", methods=["POST"])
def set_context():
    data = request.get_json() or {}
    description = (data.get("description") or "").strip()
    scenario = normalize_scenario(data.get("scenario"))
    if not description:
        return jsonify({"error": "No description provided"}), 400
    session["business_description"] = description
    session["scenario"] = scenario
    session["messages"] = []
    return jsonify({"ok": True, "scenario": scenario})

The conversation doesn’t begin until you send your first message. The agent waits for you to initiate, simulating how a real phone system would answer a call.

Message Flow

Once the conversation is active, you interact with the agent through a turn-based chat interface.

Sending Messages

Type your response as if you are the business answering the phone. Common examples:

“Thank you for calling ABC Dental. How can I help you today?”
“What size pizza would you like to order?”
“I can help with that. Can you provide your account number?”

Backend Processing

When you send a message, the /api/chat endpoint processes it:

@app.route("/api/chat", methods=["POST"])
def chat():
    business_description = get_session_value("business_description")
    scenario = normalize_scenario(get_session_value("scenario"))
    if not business_description:
        return jsonify({"error": "Set a business description first (use /api/context)"}), 400

    data = request.get_json() or {}
    user_message = (data.get("message") or "").strip()
    if not user_message:
        return jsonify({"error": "No message provided"}), 400

    messages = session.get("messages", [])
    system_prompt = build_caller_prompt(business_description, scenario)
    if not messages:
        messages = [{"role": "system", "content": system_prompt}]
    messages.append({"role": "user", "content": user_message})

System Prompt Construction

The agent receives a carefully crafted prompt that defines its behavior:

def build_caller_prompt(business_description: str, scenario: str):
    return (
        "You are simulating a real customer contacting the business below. "
        "The other side is the company, an operator, or an IVR. "
        "Stay in character as the caller/customer, try to complete the task, "
        "and avoid escalating to a human unless the flow requires it.\n\n"
        f"Business description:\n{business_description}\n\n"
        f"Caller goal:\n{scenario}\n\n"
        "Speak naturally, ask one thing at a time, and keep each response concise."
    )

The system prompt instructs the agent to “avoid escalating to a human unless the flow requires it” — this helps test self-service capabilities and IVR effectiveness.

OpenAI API Integration

The backend calls OpenAI’s Chat Completions API:

from openai import OpenAI
client = OpenAI(api_key=openai_key)
completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)
agent_text = (completion.choices[0].message.content or "").strip()

The agent’s response is appended to the conversation history:

messages.append({"role": "assistant", "content": agent_text})
session["messages"] = messages

Audio Playback

Every agent response is automatically converted to speech using ElevenLabs TTS.

Text-to-Speech Conversion

The backend generates audio using the text_to_speech_audio function:

def text_to_speech_audio(text: str):
    """Return MP3 bytes from ElevenLabs TTS."""
    client = get_elevenlabs_client()
    if not client:
        return None
    audio_stream = client.text_to_speech.convert(
        text=text[:1500],
        voice_id="JBFqnCBsd6RMkjVDRZzb",
        model_id="eleven_turbo_v2_5",
    )
    return collect_audio_bytes(audio_stream)

Text is truncated to 1500 characters to stay within ElevenLabs limits and ensure fast response times.

Response Format

The /api/chat endpoint returns both text and audio:

audio_base64 = None
try:
    audio_bytes = text_to_speech_audio(agent_text)
    if audio_bytes:
        audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")
except Exception:
    pass

return jsonify({"text": agent_text, "audio_base64": audio_base64})

Frontend Audio Rendering

The web interface receives the response and plays the audio automatically:

const data = await res.json();
appendMessage('agent', data.text, data.audio_base64);
if (data.audio_base64) playBase64Audio(data.audio_base64);

function playBase64Audio(base64) {
    if (!base64) return;
    const audio = new Audio('data:audio/mpeg;base64,' + base64);
    audio.play();
}

Each message bubble also includes an embedded audio player for replay:

if (audioBase64 && role === 'agent') {
    const aud = document.createElement('audio');
    aud.controls = true;
    aud.src = 'data:audio/mpeg;base64,' + audioBase64;
    div.appendChild(aud);
}

Audio plays automatically when received, but you can replay any message using the embedded controls in the message bubble.

Conversation UI Elements

The chat interface displays messages in a turn-based format:

Message Display

<div class="messages" id="messages"></div>

Messages are styled differently based on the speaker:

.msg.user { background: #e3f2fd; margin-left: 0; margin-right: auto; }
.msg.agent { background: #e8f5e9; margin-left: auto; margin-right: 0; }

User (You as the business): Blue background, left-aligned
Agent (AI customer): Green background, right-aligned

Message Structure

Each message shows the role and content:

function appendMessage(role, text, audioBase64) {
    const div = document.createElement('div');
    div.className = 'msg ' + role;
    div.innerHTML = '<div class="role">' + 
        (role === 'user' ? 'You (company)' : 'Agent') + 
        '</div><div>' + escapeHtml(text) + '</div>';
    messagesEl.appendChild(div);
    messagesEl.scrollTop = messagesEl.scrollHeight;
}

Error Handling

If OPENAI_API_KEY is not set in your .env file, the agent will return a placeholder message instead of generating intelligent responses.

The backend includes graceful fallbacks:

openai_key = os.getenv("OPENAI_API_KEY")
if not openai_key:
    agent_text = "I'd like to know more about your business. (Set OPENAI_API_KEY in .env for full conversation.)"
else:
    try:
        # Call OpenAI API
    except Exception as e:
        agent_text = f"I had trouble responding: {e}"

Common Errors

Error Message	Cause	Solution
`Set a business description first (use /api/context)`	Attempting to chat without initializing context	Click “Start conversation” first
`No message provided`	Sent empty message	Type a message before sending
`Request failed`	Network or server error	Check console logs and server status

API Reference

POST `/api/context`

Initializes a new conversation session. Request body:

{
  "description": "A dental clinic with online booking",
  "scenario": "check appointment availability"
}

Response:

{
  "ok": true,
  "scenario": "check appointment availability"
}

POST `/api/chat`

Sends a message and receives the agent’s response. Request body:

{
  "message": "Thank you for calling. How can I help you today?"
}

Response:

{
  "text": "Hi, I'd like to check if you have any appointments available next Tuesday afternoon?",
  "audio_base64": "//uQxAA...base64-encoded-mp3..."
}

Best Practices

Start with Clear Context

Provide detailed business descriptions including:

What the business does
Available services or products
Common customer requests
Any special policies or procedures

Use Realistic Scenarios

Test scenarios that match real customer behaviors:

Simple information requests
Transaction completions
Problem resolution
Edge cases and difficult customers

Play the Role Authentically

Respond as your actual business would:

Use your real greeting scripts
Follow your standard procedures
Apply your actual policies
Use industry-appropriate language

Test Multiple Paths

Click “Start conversation” again to reset and test different conversation flows without reloading the page.

Session Management

Conversation state is stored in Flask sessions:

session["business_description"] = description
session["scenario"] = scenario
session["messages"] = []  # Conversation history

The session persists across requests using a session cookie:

app.secret_key = os.getenv("FLASK_SECRET_KEY", "dev-secret-change-in-production")

Sessions are stored server-side in memory by default. If you restart the Flask server, all conversation history is lost.

Next Steps

After testing conversation flows in the web chat:

Move to real phone call testing to validate the same scenarios over actual phone lines
Use the same business description and scenario for consistent testing across both modes
Compare agent behavior between text chat and voice calls

Get Started

Testing Modes

Configuration

Overview

How It Works

Technical Flow

Starting a Conversation

Message Flow

Sending Messages

Backend Processing

System Prompt Construction

OpenAI API Integration

Audio Playback

Text-to-Speech Conversion

Response Format

Frontend Audio Rendering

Conversation UI Elements

Message Display

Message Structure

Error Handling

Common Errors

API Reference

POST `/api/context`

POST `/api/chat`

Best Practices

Session Management

Next Steps

Build docs developers (and LLMs) love

Get Started

Testing Modes

Configuration

​Overview

​How It Works

​Technical Flow

​Starting a Conversation

​Message Flow

​Sending Messages

​Backend Processing

​System Prompt Construction

​OpenAI API Integration

​Audio Playback

​Text-to-Speech Conversion

​Response Format

​Frontend Audio Rendering

​Conversation UI Elements

​Message Display

​Message Structure

​Error Handling

​Common Errors

​API Reference

​POST /api/context

​POST /api/chat

​Best Practices

​Session Management

​Next Steps

Build docs developers (and LLMs) love

Overview

How It Works

Technical Flow

Starting a Conversation

Message Flow

Sending Messages

Backend Processing

System Prompt Construction

OpenAI API Integration

Audio Playback

Text-to-Speech Conversion

Response Format

Frontend Audio Rendering

Conversation UI Elements

Message Display

Message Structure

Error Handling

Common Errors

API Reference

POST `/api/context`

POST `/api/chat`

Best Practices

Session Management

Next Steps