Skip to main content

Chat Interface

The Chat Interface is how you interact with Support Bot’s AI Copilot. It provides a natural, conversational way to search incidents, analyze patterns, and get resolution recommendations with streaming responses and context awareness.

Key Features

Streaming Responses

See answers as they’re generated in real-time using Server-Sent Events (SSE)

Conversation Memory

The AI remembers your entire conversation using PostgreSQL checkpointing

Smart Caching

Frequently asked questions are cached for instant responses

Markdown Support

Rich formatting for code blocks, tables, lists, and more

How It Works

1

Create or Resume Chat

Each conversation is a separate chat session with its own ID:
// New chat (no chat_id)
POST /api/chat/prompt/stream
{
  "message": "Show me recent payment failures",
  "chat_id": null,  // creates new chat
  "generate_title": true
}

// Continue existing chat
POST /api/chat/prompt/stream
{
  "message": "What was the root cause?",
  "chat_id": "550e8400-e29b-41d4-a716-446655440000"
}
2

Query Processing

Your message goes through validation and preprocessing:
# 1. Prompt guardrail validation
is_valid, reject_msg = guard.validate_or_reject(prompt=message)

# 2. Clarification check (for context-dependent queries)
should_clarify, clarification_msg = should_ask_clarification(
    message, 
    has_conversation_history
)

# 3. Cache check (for common questions)
cached_response = check_cache_for_query(message)
3

Streaming Response

The response is streamed back in real-time using SSE:
event: status
data: {"message": "Analyzing your request..."}

event: final_answer
data: {"chunk": "Based on incident INC-2025-001..."}

event: title
data: {"title": "Payment Failures"}

event: complete
data: {"answer": "...", "chat_id": "..."}
4

Save to Database

The conversation is persisted for future reference:
message = Message(
    chat_id=chat_id,
    human=user_message,
    bot=ai_response
)
session.add(message)
await session.commit()

Streaming Architecture

The chat interface uses Server-Sent Events for real-time streaming:
async def stream_generator():
    # Event types:
    yield f"event: status\ndata: {json.dumps({'message': 'Starting...'})}"
    yield f"event: final_answer\ndata: {json.dumps({'chunk': token})}"
    yield f"event: title\ndata: {json.dumps({'title': generated_title})}"
    yield f"event: complete\ndata: {json.dumps({'answer': full_answer})}"

return StreamingResponse(
    stream_generator(),
    media_type="text/event-stream",
    headers={
        "Cache-Control": "no-cache",
        "Connection": "keep-alive"
    }
)

Event Types

Shows what the AI is currently doing:
{"message": "Analyzing your request... please hold on."}
{"message": "Almost done, wrapping up the details"}
The actual AI response, streamed token by token:
{"chunk": "Based "}
{"chunk": "on incident "}
{"chunk": "INC-2025-001..."}
Auto-generated title for the conversation:
{"title": "Payment Gateway Issues"}
Signals the end of streaming:
{
  "answer": "<full response text>",
  "chat_id": "550e8400-e29b-41d4-a716-446655440000"
}

Smart Caching

Frequently asked questions are cached for instant responses:
# Check cache before processing
cached_response = check_cache_for_query(human_message)

if cached_response:
    # Stream from cache (instant)
    for i in range(0, len(cached_response), chunk_size):
        chunk_text = cached_response[i:i + chunk_size]
        yield f"event: final_answer\ndata: {json.dumps({'chunk': chunk_text})}\n\n"
    return

# Process normally and cache result
store_chat_response(human_message, answer)

Cache Benefits

Instant Responses

Cached queries return in milliseconds instead of seconds

Reduced LLM Costs

Skip LLM API calls for common questions

Consistent Answers

Same question always gets the same verified answer

Lower Latency

No vector search or LLM inference needed
The cache only works for self-contained queries without conversation context. Follow-up questions always go through the full pipeline.

Conversation Memory

The chat interface maintains context using LangGraph’s PostgreSQL checkpointing:
# Each chat has a unique thread ID
thread_config = {"configurable": {"thread_id": str(chat_id)}}

# The agent graph maintains state across invocations
result = get_support_bot_graph().stream(
    config=thread_config,
    input={"messages": [("user", human_message)]}
)
This allows natural follow-up questions:
User: "Show me incidents related to the PaymentAPI"
Bot: "Found 12 incidents... [lists details]"

Clarification System

The chat interface detects ambiguous queries and asks for clarification:
should_clarify, clarification_message = should_ask_clarification(
    human_message,
    has_conversation_history
)

if should_clarify:
    return {
        "success": False,
        "message": clarification_message,
        "needs_clarification": True
    }

Clarification Triggers

The system asks for clarification when:
  • You use pronouns without context (“tell me about it”)
  • You reference “this” or “that” without prior context
  • You ask follow-up questions in a new chat
New Chat → "What was the root cause?"
Response: "I need more context. Which incident are you asking about?"

Title Generation

Conversations are automatically titled for easy reference:
# Generated in parallel with the main response
title_task = asyncio.create_task(
    asyncio.to_thread(
        generate_title_from_query,
        human_message,
        session_id,
        user_id,
        langfuse_enabled
    )
)

# Prompt for title generation
prompt = SystemMessage(
    "Generate a concise, 2-4 word title for this query. "
    "The title should clearly represent the main theme or subject. "
    f"Query: {query}\n\n"
    "The output must be only the title."
)
Examples:
  • “Show me payment failures” → “Payment Failures”
  • “Database timeout in loan processing” → “Loan Processing Timeout”
  • “Swift transfer issues EU region” → “EU Swift Transfers”

Managing Chats

Listing Chats

Get all your conversations:
GET /api/chat/?limit=20&offset=0
{
  "error": false,
  "chats": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "title": "Payment Gateway Issues",
      "updated_at": "2025-03-01T14:30:00Z"
    }
  ],
  "total": 45,
  "has_more": true
}

Viewing Messages

Retrieve a full conversation:
GET /api/chat/messages/{chat_id}?limit=50&offset=0
{
  "error": false,
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Payment Gateway Issues",
  "messages": [
    {
      "id": "msg-001",
      "human": "Show me payment gateway errors",
      "bot": "Found 5 relevant incidents...",
      "created_at": "2025-03-01T14:25:00Z"
    }
  ],
  "total": 8,
  "has_more": false
}

Renaming Chats

Change the conversation title:
PUT /api/chat/rename/{chat_id}
{
  "title": "Payment API Debugging Session"
}

Archiving Chats

Remove chats from your active list:
DELETE /api/chat/archive/{chat_id}
Archived chats are soft-deleted (not permanently removed) and can be restored if needed.

Non-Streaming Mode

For simple integrations, use the non-streaming endpoint:
POST /api/chat/prompt
{
  "message": "Show me recent database issues",
  "chat_id": null,
  "generate_title": true
}
Response:
{
  "answer": "Based on recent incidents...",
  "chat_id": "550e8400-e29b-41d4-a716-446655440000"
}
This waits for the full response before returning (no streaming).

LLM Provider Override

Override the default LLM provider per message:
{
  "message": "Analyze this complex issue",
  "chat_id": "550e8400-e29b-41d4-a716-446655440000",
  "provider_id": "anthropic-1",
  "model_id": "claude-3-5-sonnet-20241022"
}
This lets you use different models for different tasks (e.g., GPT-4 for analysis, Claude for writing).

Prompt Guardrails

The chat interface validates messages before processing:
from src.copilot.guardrails.prompt_guardrails import PromptGuardrail

guard = PromptGuardrail(deny_words="confidential,secret,password")
is_valid, reject_msg = guard.validate_or_reject(prompt=user_message)

if not is_valid:
    return {"success": False, "message": reject_msg}
This prevents:
  • Injection attacks
  • Inappropriate content
  • Sensitive data leakage
Admins can configure deny words in the system settings.

Best Practices

Writing Good Queries

Provide clear context in your questions:✅ “Show me HTTP 500 errors in the PaymentAPI from last week”❌ “Show me errors”
Reference specific incidents when available:✅ “What was the mitigation for INC-2025-001?”❌ “What did we do for that payment issue?”
Take advantage of conversation memory:
  1. “Find database timeout incidents”
  2. “Which application was affected most?”
  3. “Show me the mitigation steps for the LoanAPI ones”
Create a new chat for unrelated topics:Chat 1: Payment gateway issues Chat 2: Database performance problems Chat 3: Deployment failures

Performance Tips

Use Streaming

Streaming provides faster perceived performance as users see responses immediately

Keep Chats Focused

Shorter conversations with fewer messages perform better

Archive Old Chats

Regularly archive completed investigations to keep your list clean

Cache Common Questions

Frequently asked questions are automatically cached for instant responses

Troubleshooting

Streaming Connection Drops

If the SSE connection is interrupted:
  1. Check timeout settings: Ensure your reverse proxy doesn’t timeout SSE connections
  2. Verify network: Test with a stable connection
  3. Use reconnection logic: Implement automatic reconnection in your client

Context Not Maintained

If follow-up questions don’t work:
  1. Verify chat_id: Ensure you’re passing the same chat_id for follow-ups
  2. Check PostgreSQL: The checkpointer requires a working database
  3. Review logs: Look for “thread_id” in the backend logs

Slow Response Times

If responses are slow:
  1. Check cache hit rate: Common questions should be cached
  2. Monitor LLM latency: Some providers are faster than others
  3. Review incident count: Large knowledge bases may need optimization

Next Steps

AI Copilot

Learn how the underlying AI agent works

API Reference

Complete API documentation for chat endpoints

LLM Providers

Configure different AI models for your chats

Guardrails

Set up prompt validation and safety controls

Build docs developers (and LLMs) love