Chat Interface

The Chat Interface is how you interact with Support Bot’s AI Copilot. It provides a natural, conversational way to search incidents, analyze patterns, and get resolution recommendations with streaming responses and context awareness.

Key Features

Streaming Responses

See answers as they’re generated in real-time using Server-Sent Events (SSE)

Conversation Memory

The AI remembers your entire conversation using PostgreSQL checkpointing

Smart Caching

Frequently asked questions are cached for instant responses

Markdown Support

Rich formatting for code blocks, tables, lists, and more

How It Works

Create or Resume Chat

Each conversation is a separate chat session with its own ID:

// New chat (no chat_id)
POST /api/chat/prompt/stream
{
  "message": "Show me recent payment failures",
  "chat_id": null,  // creates new chat
  "generate_title": true
}

// Continue existing chat
POST /api/chat/prompt/stream
{
  "message": "What was the root cause?",
  "chat_id": "550e8400-e29b-41d4-a716-446655440000"
}

Query Processing

Your message goes through validation and preprocessing:

# 1. Prompt guardrail validation
is_valid, reject_msg = guard.validate_or_reject(prompt=message)

# 2. Clarification check (for context-dependent queries)
should_clarify, clarification_msg = should_ask_clarification(
    message, 
    has_conversation_history
)

# 3. Cache check (for common questions)
cached_response = check_cache_for_query(message)

Streaming Response

The response is streamed back in real-time using SSE:

event: status
data: {"message": "Analyzing your request..."}

event: final_answer
data: {"chunk": "Based on incident INC-2025-001..."}

event: title
data: {"title": "Payment Failures"}

event: complete
data: {"answer": "...", "chat_id": "..."}

Save to Database

The conversation is persisted for future reference:

message = Message(
    chat_id=chat_id,
    human=user_message,
    bot=ai_response
)
session.add(message)
await session.commit()

Streaming Architecture

The chat interface uses Server-Sent Events for real-time streaming:

async def stream_generator():
    # Event types:
    yield f"event: status\ndata: {json.dumps({'message': 'Starting...'})}"
    yield f"event: final_answer\ndata: {json.dumps({'chunk': token})}"
    yield f"event: title\ndata: {json.dumps({'title': generated_title})}"
    yield f"event: complete\ndata: {json.dumps({'answer': full_answer})}"

return StreamingResponse(
    stream_generator(),
    media_type="text/event-stream",
    headers={
        "Cache-Control": "no-cache",
        "Connection": "keep-alive"
    }
)

Event Types

status - Processing Updates

Shows what the AI is currently doing:

{"message": "Analyzing your request... please hold on."}
{"message": "Almost done, wrapping up the details"}

final_answer - Response Chunks

The actual AI response, streamed token by token:

{"chunk": "Based "}
{"chunk": "on incident "}
{"chunk": "INC-2025-001..."}

title - Conversation Title

Auto-generated title for the conversation:

{"title": "Payment Gateway Issues"}

complete - Final Response

Signals the end of streaming:

{
  "answer": "<full response text>",
  "chat_id": "550e8400-e29b-41d4-a716-446655440000"
}

Smart Caching

Frequently asked questions are cached for instant responses:

# Check cache before processing
cached_response = check_cache_for_query(human_message)

if cached_response:
    # Stream from cache (instant)
    for i in range(0, len(cached_response), chunk_size):
        chunk_text = cached_response[i:i + chunk_size]
        yield f"event: final_answer\ndata: {json.dumps({'chunk': chunk_text})}\n\n"
    return

# Process normally and cache result
store_chat_response(human_message, answer)

Cache Benefits

Instant Responses

Cached queries return in milliseconds instead of seconds

Reduced LLM Costs

Skip LLM API calls for common questions

Consistent Answers

Same question always gets the same verified answer

Lower Latency

No vector search or LLM inference needed

The cache only works for self-contained queries without conversation context. Follow-up questions always go through the full pipeline.

Conversation Memory

The chat interface maintains context using LangGraph’s PostgreSQL checkpointing:

# Each chat has a unique thread ID
thread_config = {"configurable": {"thread_id": str(chat_id)}}

# The agent graph maintains state across invocations
result = get_support_bot_graph().stream(
    config=thread_config,
    input={"messages": [("user", human_message)]}
)

This allows natural follow-up questions:

User: "Show me incidents related to the PaymentAPI"
Bot: "Found 12 incidents... [lists details]"

Clarification System

The chat interface detects ambiguous queries and asks for clarification:

should_clarify, clarification_message = should_ask_clarification(
    human_message,
    has_conversation_history
)

if should_clarify:
    return {
        "success": False,
        "message": clarification_message,
        "needs_clarification": True
    }

Clarification Triggers

The system asks for clarification when:

You use pronouns without context (“tell me about it”)
You reference “this” or “that” without prior context
You ask follow-up questions in a new chat

New Chat → "What was the root cause?"
Response: "I need more context. Which incident are you asking about?"

Title Generation

Conversations are automatically titled for easy reference:

# Generated in parallel with the main response
title_task = asyncio.create_task(
    asyncio.to_thread(
        generate_title_from_query,
        human_message,
        session_id,
        user_id,
        langfuse_enabled
    )
)

# Prompt for title generation
prompt = SystemMessage(
    "Generate a concise, 2-4 word title for this query. "
    "The title should clearly represent the main theme or subject. "
    f"Query: {query}\n\n"
    "The output must be only the title."
)

Examples:

“Show me payment failures” → “Payment Failures”
“Database timeout in loan processing” → “Loan Processing Timeout”
“Swift transfer issues EU region” → “EU Swift Transfers”

Managing Chats

Listing Chats

Get all your conversations:

GET /api/chat/?limit=20&offset=0

{
  "error": false,
  "chats": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "title": "Payment Gateway Issues",
      "updated_at": "2025-03-01T14:30:00Z"
    }
  ],
  "total": 45,
  "has_more": true
}

Viewing Messages

Retrieve a full conversation:

GET /api/chat/messages/{chat_id}?limit=50&offset=0

{
  "error": false,
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Payment Gateway Issues",
  "messages": [
    {
      "id": "msg-001",
      "human": "Show me payment gateway errors",
      "bot": "Found 5 relevant incidents...",
      "created_at": "2025-03-01T14:25:00Z"
    }
  ],
  "total": 8,
  "has_more": false
}

Renaming Chats

Change the conversation title:

PUT /api/chat/rename/{chat_id}

{
  "title": "Payment API Debugging Session"
}

Archiving Chats

Remove chats from your active list:

DELETE /api/chat/archive/{chat_id}

Archived chats are soft-deleted (not permanently removed) and can be restored if needed.

Non-Streaming Mode

For simple integrations, use the non-streaming endpoint:

POST /api/chat/prompt

{
  "message": "Show me recent database issues",
  "chat_id": null,
  "generate_title": true
}

Response:

{
  "answer": "Based on recent incidents...",
  "chat_id": "550e8400-e29b-41d4-a716-446655440000"
}

This waits for the full response before returning (no streaming).

LLM Provider Override

Override the default LLM provider per message:

{
  "message": "Analyze this complex issue",
  "chat_id": "550e8400-e29b-41d4-a716-446655440000",
  "provider_id": "anthropic-1",
  "model_id": "claude-3-5-sonnet-20241022"
}

This lets you use different models for different tasks (e.g., GPT-4 for analysis, Claude for writing).

Prompt Guardrails

The chat interface validates messages before processing:

from src.copilot.guardrails.prompt_guardrails import PromptGuardrail

guard = PromptGuardrail(deny_words="confidential,secret,password")
is_valid, reject_msg = guard.validate_or_reject(prompt=user_message)

if not is_valid:
    return {"success": False, "message": reject_msg}

This prevents:

Injection attacks
Inappropriate content
Sensitive data leakage

Admins can configure deny words in the system settings.

Best Practices

Writing Good Queries

Be Specific

Provide clear context in your questions:✅ “Show me HTTP 500 errors in the PaymentAPI from last week”❌ “Show me errors”

Use Incident IDs

Reference specific incidents when available:✅ “What was the mitigation for INC-2025-001?”❌ “What did we do for that payment issue?”

Build on Context

Take advantage of conversation memory:

“Find database timeout incidents”
“Which application was affected most?”
“Show me the mitigation steps for the LoanAPI ones”

Start Fresh When Needed

Create a new chat for unrelated topics:Chat 1: Payment gateway issues Chat 2: Database performance problems Chat 3: Deployment failures

Performance Tips

Use Streaming

Streaming provides faster perceived performance as users see responses immediately

Keep Chats Focused

Shorter conversations with fewer messages perform better

Archive Old Chats

Regularly archive completed investigations to keep your list clean

Cache Common Questions

Frequently asked questions are automatically cached for instant responses

Troubleshooting

Streaming Connection Drops

If the SSE connection is interrupted:

Check timeout settings: Ensure your reverse proxy doesn’t timeout SSE connections
Verify network: Test with a stable connection
Use reconnection logic: Implement automatic reconnection in your client

Context Not Maintained

If follow-up questions don’t work:

Verify chat_id: Ensure you’re passing the same chat_id for follow-ups
Check PostgreSQL: The checkpointer requires a working database
Review logs: Look for “thread_id” in the backend logs

Slow Response Times

If responses are slow:

Check cache hit rate: Common questions should be cached
Monitor LLM latency: Some providers are faster than others
Review incident count: Large knowledge bases may need optimization

Next Steps

AI Copilot

Learn how the underlying AI agent works

API Reference

Complete API documentation for chat endpoints

LLM Providers

Configure different AI models for your chats

Guardrails

Set up prompt validation and safety controls

Get Started

Core Features

Administration

Deployment

​Chat Interface

​Key Features

Streaming Responses

Conversation Memory

Smart Caching

Markdown Support

​How It Works

​Streaming Architecture

​Event Types

​Smart Caching

​Cache Benefits

Instant Responses

Reduced LLM Costs

Consistent Answers

Lower Latency

​Conversation Memory

​Clarification System

​Clarification Triggers

​Title Generation

​Managing Chats

​Listing Chats

​Viewing Messages

​Renaming Chats

​Archiving Chats

​Non-Streaming Mode

​LLM Provider Override

​Prompt Guardrails

​Best Practices

​Writing Good Queries

​Performance Tips

Use Streaming

Keep Chats Focused

Archive Old Chats

Cache Common Questions

​Troubleshooting

​Streaming Connection Drops

​Context Not Maintained

​Slow Response Times

​Next Steps

AI Copilot

API Reference

LLM Providers

Guardrails

Build docs developers (and LLMs) love

Chat Interface

Key Features

How It Works

Streaming Architecture

Event Types

Smart Caching

Cache Benefits

Conversation Memory

Clarification System

Clarification Triggers

Title Generation

Managing Chats

Listing Chats

Viewing Messages

Renaming Chats

Archiving Chats

Non-Streaming Mode

LLM Provider Override

Prompt Guardrails

Best Practices

Writing Good Queries

Performance Tips

Troubleshooting

Streaming Connection Drops

Context Not Maintained

Slow Response Times

Next Steps