Skip to main content
POST
/
api
/
chat
Chat
curl --request POST \
  --url https://api.example.com/api/chat \
  --header 'Content-Type: application/json' \
  --data '
{
  "message": "<string>"
}
'
{
  "text": "<string>",
  "audio_base64": {}
}

Overview

This endpoint simulates a conversation with an AI customer who is calling your business. You send a message as if you’re the business/operator/IVR, and the AI responds as a customer trying to complete their goal. The AI uses OpenAI’s GPT-4o-mini model for conversation and ElevenLabs for text-to-speech audio generation. Prerequisites: You must call /api/context first to set the business description and scenario.

Request

message
string
required
The message from your business/IVR/operator to the AI caller. This field is required and cannot be empty or whitespace-only.Example: “Thank you for calling Pizza Palace. How can I help you today?”

Response

text
string
The AI caller’s text response to your message.
audio_base64
string | null
Base64-encoded MP3 audio of the AI caller’s response. Returns null if:
  • ElevenLabs API key is not configured
  • Audio generation fails
Decode this string and play it as audio/mpeg to hear the AI’s response.

Error Responses

400 Bad Request

No context set:
{
  "error": "Set a business description first (use /api/context)"
}
No message provided:
{
  "error": "No message provided"
}

Examples

# First, set context
curl -X POST http://localhost:5000/api/context \
  -H "Content-Type: application/json" \
  -d '{
    "description": "We are a pizza restaurant.",
    "scenario": "order a large pepperoni pizza"
  }'

# Then send a message
curl -X POST http://localhost:5000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Thank you for calling Pizza Palace. How can I help you?"
  }'

Success Response

{
  "text": "Hi, I'd like to order a large pepperoni pizza for delivery, please.",
  "audio_base64": "SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4Ljc2LjEwMAAAAAAAAAAAAAAA//tQAAAAAAAA..."
}

Implementation Details

Conversation Flow

  1. The endpoint retrieves the business description and scenario from the Flask session
  2. If this is the first message, it creates a system prompt using build_caller_prompt()
  3. Your message is added to the conversation history as a user message
  4. OpenAI GPT-4o-mini generates the AI caller’s response
  5. The response is added to conversation history as an assistant message
  6. ElevenLabs converts the text response to speech audio
  7. Audio is base64-encoded and returned with the text

System Prompt

The AI is instructed with this prompt:
“You are simulating a real customer contacting the business below. The other side is the company, an operator, or an IVR. Stay in character as the caller/customer, try to complete the task, and avoid escalating to a human unless the flow requires it.”
Followed by the business description and caller goal.

Audio Generation

  • Uses ElevenLabs TTS with voice ID: JBFqnCBsd6RMkjVDRZzb
  • Model: eleven_turbo_v2_5
  • Text is truncated to 1500 characters for TTS
  • Returns MP3 format audio
  • Audio generation failures are silent - the endpoint returns null for audio_base64 but doesn’t fail

Fallback Behavior

No OpenAI API Key: If OPENAI_API_KEY is not set in environment variables, the AI returns:
"I'd like to know more about your business. (Set OPENAI_API_KEY in .env for full conversation.)"
OpenAI Error: If OpenAI API call fails, the error is returned as the AI’s response:
"I had trouble responding: [error message]"

Session Management

  • Conversation history is stored in session["messages"] as an array of message objects
  • Each message has role (“system”, “user”, or “assistant”) and content fields
  • Messages persist across multiple /api/chat calls until /api/context is called again

Build docs developers (and LLMs) love