Workflow - Support Bot

Overview

This guide walks through the complete lifecycle of a query as it flows through the agent, from initial user input to final response generation.

Complete Workflow Diagram

Phase-by-Phase Breakdown

Phase 1: Query Initiation

User Input

A user submits a query through the API:

request = {
    "message": "How do I fix HTTP 403 errors in PayU?",
    "session_id": "session-abc123",
    "user_id": "user-456"
}

Graph Invocation

The API layer invokes the compiled graph:

app = create_agent_graph()

result = app.invoke(
    {
        "messages": [("user", request["message"])],
        "session_id": request["session_id"],
        "user_id": request["user_id"],
        "langfuse_enabled": True,
        "generate_title": True
    },
    config={
        "configurable": {"thread_id": request["session_id"]}
    }
)

The graph automatically loads any previous conversation history from the PostgreSQL checkpoint using the thread_id.

Phase 2: Support Bot Processing

Node: support_bot (call_model function)

Step 1: Extract User Query

user_messages = [m for m in state["messages"] if m.type == 'human']
latest_query = user_messages[-1].content
# Result: "How do I fix HTTP 403 errors in PayU?"

Step 2: Search Golden Examples

The agent searches for similar past conversations:

golden_examples = search_golden_examples_sync(
    query=latest_query,
    top_k=2,
    score_threshold=0.6
)

Purpose: Golden examples provide verified response patterns that improve answer quality. Example Golden Example:

Query: "How to resolve PayU payment failures?"
Response: "Based on incident INC-2025-01-15-003, PayU payment failures..."

Step 3: Enhance System Prompt

If golden examples are found, inject them into the system message:

enhanced_content = build_prompt_with_golden_examples(
    base_prompt=SYSTEM_MESSAGE_PROMPT.content,
    golden_examples=golden_examples
)
enhanced_system_prompt = SystemMessage(enhanced_content)

Enhanced Prompt Structure:

[Base System Prompt]

## Verified Knowledge
Direct Answer Available: Yes

[Golden Example 1]
Query: ...
Response: ...

[Golden Example 2]
...

Step 4: Bind Tools to LLM

model_with_tools = llm.bind_tools(available_tools)
# Tools: lookup_incident_by_id, search_similar_incidents, 
#        get_incidents_by_application, get_recent_incidents

Step 5: Invoke LLM

messages = [enhanced_system_prompt] + list(state["messages"])

response = model_with_tools.invoke(
    messages,
    config={"callbacks": callbacks, "run_name": "Support Bot LLM"}
)

LLM Reasoning (internal):

User is asking about HTTP 403 errors in PayU
This is a problem description without specific incident ID
Need to rewrite query: “HTTP 403 forbidden PayU”
Select tool: search_similar_incidents
Generate tool call

Response (src/copilot/graph.py:260):

AIMessage(
    content="",
    tool_calls=[
        {
            "name": "search_similar_incidents",
            "args": {"query": "HTTP 403 forbidden PayU", "limit": 5},
            "id": "call_abc123"
        }
    ]
)

Phase 3: Conditional Routing

Function: wants_qdrant_tool (src/copilot/graph.py:286)

last_message = state["messages"][-1]

if last_message.tool_calls:
    writer({"status": "Analyzing your request... please hold on."})
    return "continue"  # → incident_tools node

Decision: LLM requested a tool call → Route to incident_tools

Phase 4: Tool Execution

Node: incident_tools (tool_wrapper function)

Step 1: Extract Tool Call

The ToolNode extracts the tool call from the message:

tool_name = "search_similar_incidents"
tool_args = {"query": "HTTP 403 forbidden PayU", "limit": 5}

Step 2: Execute Tool Function

Function: search_similar_incidents (src/copilot/tools/incident_tools.py:141)

@tool
def search_similar_incidents(query: str, limit: int = 5) -> str:
    writer = get_safe_stream_writer()
    writer({"status": "Searching for Similar Incidents..."})
    
    retriever = _get_retriever()
    
    # Try SelfQueryRetriever first
    docs = retriever.invoke(input=query)
    
    # Fallback to vector search if needed
    if not docs:
        docs = vector_store.similarity_search(query=query, k=limit * 2)
    
    return format_incidents_response(docs)

Step 3: Qdrant Search

The tool queries the Qdrant vector database:

# Semantic similarity search
results = vector_store.similarity_search(
    query="HTTP 403 forbidden PayU",
    k=10
)

Qdrant Process:

Embed query using embedding model
Search for similar vectors in past_issues_v2 collection
Return top-k results with metadata

Step 4: Format Results

def format_incidents_response(docs: List[Document]) -> str:
    # Deduplicate by incident_id
    # Format as structured text with metadata
    # Include: ID, title, description, root cause, resolution, etc.

Example Output:

## Incident INC-2025-01-20-005
**Title**: PayU HTTP 403 Authorization Failure
**Impacted Application**: PayU Core
**Root Cause**: API key rotation caused authentication failures
**Action Taken**: Updated API keys in all environments
**Status**: Resolved

## Incident INC-2025-01-18-012
...

Step 5: Return Tool Message

The tool node returns a ToolMessage:

ToolMessage(
    content="[Formatted incident results]",
    tool_call_id="call_abc123"
)

Graph Action: Direct edge routes back to support_bot (src/copilot/graph.py:449)

Phase 5: Response Generation

Node: support_bot (second invocation) The LLM now has tool results in the conversation history:

messages = [
    SystemMessage("You are an expert incident resolution assistant..."),
    HumanMessage("How do I fix HTTP 403 errors in PayU?"),
    AIMessage(tool_calls=[...]),
    ToolMessage("## Incident INC-2025-01-20-005\n...")
]

LLM Reasoning:

Tool returned 2 relevant incidents
Both relate to PayU HTTP 403 errors
Most recent: INC-2025-01-20-005 (API key rotation)
Generate response citing the incident

Response:

AIMessage(
    content="""Based on incident INC-2025-01-20-005, HTTP 403 errors in PayU are 
    typically caused by API key authentication failures. The resolution involves:
    
    1. Verify API keys are correctly configured in all environments
    2. Check if API keys have been recently rotated
    3. Update configuration files with new keys
    4. Restart affected services
    
    The incident was resolved by updating API keys across all environments after 
    a scheduled key rotation.""",
    tool_calls=None
)

Phase 6: Title Generation

Conditional Edge Decision:

last_message = state["messages"][-1]

if not last_message.tool_calls:
    if not state.get("title") and state.get("generate_title", True):
        return "title_generation"  # → title_generation node

Node: title_generation (title_generation_node function)

Step 1: Create Conversation Transcript

chat_text = "\n".join(
    f"{m.type.upper()}: {getattr(m, 'content', '')}"
    for m in state["messages"]
)

Result:

HUMAN: How do I fix HTTP 403 errors in PayU?
AI: Based on incident INC-2025-01-20-005, HTTP 403 errors...

Step 2: Generate Title

prompt = SystemMessage(
    "Generate a concise, 2-4 word title..."
)

response = llm.invoke([prompt])
title_text = response.content.strip()  # "PayU HTTP 403 Fix"

Step 3: Stream Title

writer = get_stream_writer()
writer({"title": title_text})
writer({"status": "Almost done, wrapping up the details"})

return {"title": title_text}

Phase 7: Completion

The graph reaches the END state and returns the final state:

{
    "messages": [
        HumanMessage("How do I fix HTTP 403 errors in PayU?"),
        AIMessage(tool_calls=[...]),
        ToolMessage("[incident results]"),
        AIMessage("Based on incident INC-2025-01-20-005...")
    ],
    "title": "PayU HTTP 403 Fix",
    "session_id": "session-abc123",
    "user_id": "user-456"
}

The state is automatically persisted to PostgreSQL via the checkpointer.

Alternative Workflows

Direct Incident ID Lookup

User Query: “Show me incident INC-2025-08-24-001” Workflow Changes:

Support Bot: LLM recognizes specific incident ID
Tool Selection: Calls lookup_incident_by_id instead of similarity search
Tool Execution: Direct Qdrant filter by metadata.incident_id
Response: Returns complete incident details

Tool Call:

{
    "name": "lookup_incident_by_id",
    "args": {"incident_id": "INC-2025-08-24-001"}
}

Qdrant Query:

qdrant_filter = Filter(
    must=[
        FieldCondition(
            key="metadata.incident_id",
            match=MatchValue(value="INC-2025-08-24-001")
        )
    ]
)

Application-Specific Search

User Query: “What incidents affected the Settlement & Reporting system?” Tool Call:

{
    "name": "get_incidents_by_application",
    "args": {"app_name": "Settlement & Reporting", "limit": 5}
}

Process (src/copilot/tools/incident_tools.py:214):

Filter by metadata.impacted_application
Deduplicate by incident_id
Fallback to semantic search if no exact matches

Time-Based Search

User Query: “Show me incidents from the last 7 days” Tool Call:

{
    "name": "get_recent_incidents",
    "args": {"days": 7, "limit": 10}
}

Process (src/copilot/tools/incident_tools.py:287):

Calculate cutoff date: datetime.now() - timedelta(days=7)
Scroll through all incidents
Parse dates from incident IDs (format: INC-YYYY-MM-DD-NNN)
Filter and sort by date

Multi-Turn Conversations

Turn 1:

User: "What caused incident INC-2025-08-24-001?"
Agent: [Looks up incident] "The root cause was..."

Turn 2 (same thread_id):

User: "What was the resolution?"
Agent: [Uses memory, no tool call] "The resolution involved..."

Key Difference: The system prompt instructs the LLM to use memory for follow-up questions about already-retrieved incidents.

Streaming Behavior

The agent supports streaming for real-time updates:

for mode, chunk in app.stream(
    input={"messages": [("user", query)]},
    config={"configurable": {"thread_id": "thread-1"}},
    stream_mode=["messages", "custom"]
):
    if mode == "custom":
        # Status updates: {"status": "...", "title": "..."}
        print(chunk)
    elif mode == "messages":
        # Token-by-token LLM output
        if chunk.content:
            print(chunk.content, end="", flush=True)

Stream Updates

Custom Stream Events:

{"status": "Analyzing your request... please hold on."} — Tool call detected
{"status": "Searching for Similar Incidents..."} — Tool executing
{"status": "Found 2 relevant incidents..."} — Tool completed
{"status": "Generating title for the incident report..."} — Title generation
{"title": "PayU HTTP 403 Fix"} — Title generated
{"status": "Almost done, wrapping up the details"} — Completion

Error Handling

Tool Execution Errors

All tools have try-except blocks:

try:
    docs = retriever.invoke(input=query)
    return format_incidents_response(docs)
except Exception as e:
    logger.error(f"Error in search_similar_incidents: {e}")
    return (
        "An error occurred while searching for incidents. "
        "Please try rephrasing your query..."
    )

LLM Fallbacks

If the primary retriever fails, tools fall back to simpler methods:

try:
    docs = retriever.invoke(input=query)  # SelfQueryRetriever
except Exception:
    docs = vector_store.similarity_search(query=query)  # Fallback

Empty Results

if not docs:
    writer({"status": "No similar incidents found"})
    return "No incidents found matching your query."

The LLM receives this message and informs the user appropriately.

Performance Optimizations

LLM Caching

Benefit: Avoids recreating LLM instances on every request

if _cached_llm_config_hash == config_hash and _cached_llm is not None:
    return  # Use cached LLM

Parallel Title Generation

For API endpoints that need immediate responses, title generation can run in parallel:

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor() as executor:
    # Start title generation in background
    title_future = executor.submit(
        generate_title_from_query,
        query=user_query,
        session_id=session_id
    )
    
    # Stream main response
    for chunk in app.stream(...):
        yield chunk
    
    # Get title when ready
    title = title_future.result()

Golden Example Caching

Golden examples are searched on every query, but results are typically fast due to vector search optimization.

Overview

Components

Configuration

​Overview

​Complete Workflow Diagram

​Phase-by-Phase Breakdown

​Phase 1: Query Initiation

​User Input

​Graph Invocation

​Phase 2: Support Bot Processing

​Step 1: Extract User Query

​Step 2: Search Golden Examples

​Step 3: Enhance System Prompt

​Step 4: Bind Tools to LLM

​Step 5: Invoke LLM

​Phase 3: Conditional Routing

​Phase 4: Tool Execution

​Step 1: Extract Tool Call

​Step 2: Execute Tool Function

​Step 3: Qdrant Search

​Step 4: Format Results

​Step 5: Return Tool Message

​Phase 5: Response Generation

​Phase 6: Title Generation

​Step 1: Create Conversation Transcript

​Step 2: Generate Title

​Step 3: Stream Title

​Phase 7: Completion

​Alternative Workflows

​Direct Incident ID Lookup

​Application-Specific Search

​Time-Based Search

​Multi-Turn Conversations

​Streaming Behavior

​Stream Updates

​Error Handling

​Tool Execution Errors

​LLM Fallbacks

​Empty Results

​Performance Optimizations

​LLM Caching

​Parallel Title Generation

​Golden Example Caching

​Next Steps

Tools Reference

Chat API

Build docs developers (and LLMs) love

Overview

Complete Workflow Diagram

Phase-by-Phase Breakdown

Phase 1: Query Initiation

User Input

Graph Invocation

Phase 2: Support Bot Processing

Step 1: Extract User Query

Step 2: Search Golden Examples

Step 3: Enhance System Prompt

Step 4: Bind Tools to LLM

Step 5: Invoke LLM

Phase 3: Conditional Routing

Phase 4: Tool Execution

Step 1: Extract Tool Call

Step 2: Execute Tool Function

Step 3: Qdrant Search

Step 4: Format Results

Step 5: Return Tool Message

Phase 5: Response Generation

Phase 6: Title Generation

Step 1: Create Conversation Transcript

Step 2: Generate Title

Step 3: Stream Title

Phase 7: Completion

Alternative Workflows

Direct Incident ID Lookup

Application-Specific Search

Time-Based Search

Multi-Turn Conversations

Streaming Behavior

Stream Updates

Error Handling

Tool Execution Errors

LLM Fallbacks

Empty Results

Performance Optimizations

LLM Caching

Parallel Title Generation

Golden Example Caching

Next Steps