Skip to main content
Aurora’s chat interface provides a natural language interface to your cloud infrastructure with real-time streaming responses, intelligent context management, and multi-session support.

Architecture Overview

WebSocket Connection

The chat uses a persistent WebSocket connection for bidirectional real-time communication:
  • Client: Next.js frontend with useWebSocket hook
  • Server: Python WebSocket server (main_chatbot.py) running on port 5006
  • Protocol: JSON messages with types: init, message, status, tool_call, usage_info, control
// client/src/hooks/useWebSocket.ts
const useWebSocket = ({ url, userId, onMessage }) => {
  const wsRef = useRef<WebSocket | null>(null);
  const [isConnected, setIsConnected] = useState(false);
  
  useEffect(() => {
    const ws = new WebSocket(url);
    
    ws.onopen = () => {
      setIsConnected(true);
      // Send init message to register user
      ws.send(JSON.stringify({
        type: 'init',
        user_id: userId
      }));
    };
    
    ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
      onMessage(data);
    };
    
    wsRef.current = ws;
    return () => ws.close();
  }, [url, userId]);
  
  return { wsRef, isConnected, send: (data) => wsRef.current?.send(data) };
};

Message Flow

1

User sends message

Frontend sends JSON with query, session_id, user_id, mode, model, and provider_preference
2

Agent processes request

LangGraph workflow executes with streaming enabled via astream_events API
3

Tokens stream back

Server sends token events for each chunk of AI response (real-time streaming)
4

Tool calls execute

When AI needs to run a tool, server sends tool_call event with status updates
5

Final status

Server sends END status when workflow completes

Chat Modes

Aurora supports two chat modes controlled by the mode selector:

Ask Mode

Read-only mode for safe exploration:
  • Cannot execute write operations
  • No infrastructure changes
  • Safe for production queries
  • Tools blocked: cloud_tool (write), kubectl_tool (apply/delete), github_commit, iac_tool (apply)

Agent Mode

Full execution mode for automation:
  • Can deploy infrastructure
  • Execute kubectl commands
  • Commit to GitHub
  • Modify cloud resources
  • Requires confirmation for high-risk actions
# server/chat/backend/agent/access.py
class ModeAccessController:
    @staticmethod
    def is_tool_allowed(mode: str, tool_name: str) -> bool:
        """Check if a tool is allowed in the current mode."""
        if mode == "ask":
            # Block write operations in ask mode
            write_tools = [
                "cloud_tool", "kubectl_tool", "github_commit",
                "iac_tool", "github_apply_fix"
            ]
            return tool_name not in write_tools
        return True  # Agent mode allows all tools

Real-Time Streaming

Token-Level Streaming

The chat uses LangGraph’s astream_events API for token-level streaming:
# server/chat/backend/agent/workflow.py
async def stream(self, input_state: State):
    """Stream the workflow with token-level streaming."""
    async for event in self.app.astream_events(input_state, self.config, version="v2"):
        event_type = event.get("event")
        
        if event_type == "on_chat_model_stream":
            # Stream individual tokens from LLM
            chunk_data = event.get("data", {})
            chunk_obj = chunk_data.get("chunk")
            if chunk_obj and hasattr(chunk_obj, 'content') and chunk_obj.content:
                content = _extract_text_from_content(chunk_obj.content)
                if content:
                    yield ("token", content)
Gemini thinking models return content as a list of blocks with types thinking and text. Aurora extracts both for a complete thought stream.

Message Consolidation

During streaming, AIMessageChunk objects are accumulated and consolidated into complete AIMessage objects:
def _consolidate_message_chunks(self):
    """Consolidate AIMessageChunk objects into proper AIMessage objects."""
    chunk_builders = {}
    
    for msg in messages:
        if isinstance(msg, AIMessageChunk):
            builder = chunk_builders.get(msg.id)
            if builder is None:
                builder = {
                    "content": "",
                    "tool_calls": [],
                    "id": msg.id
                }
                chunk_builders[msg.id] = builder
            
            # Accumulate content
            builder["content"] += msg.content
            
            # Accumulate tool calls
            if hasattr(msg, 'tool_calls') and msg.tool_calls:
                builder["tool_calls"].extend(msg.tool_calls)
            
            # Finalize on finish_reason
            if msg.response_metadata.get("finish_reason") in ("tool_calls", "stop"):
                cleaned_tool_calls = self._clean_tool_calls(builder["tool_calls"])
                ai_msg = AIMessage(
                    content=builder["content"],
                    tool_calls=cleaned_tool_calls,
                    id=builder["id"]
                )
                consolidated_messages.append(ai_msg)
                del chunk_builders[msg.id]

Session Management

Creating a New Session

Chat sessions are created lazily on first message:
// client/src/app/chat/components/useChatSendHandlers.ts
const handleSend = async (text: string, websocket: WebSocket) => {
  let effectiveSessionId = currentSessionId;
  
  if (!effectiveSessionId) {
    // Create new session
    const newSession = await createSession({
      title: text.substring(0, 50) + (text.length > 50 ? '...' : ''),
      userId: userId
    });
    effectiveSessionId = newSession.id;
    setCurrentSessionId(newSession.id);
    justCreatedSessionRef.current = newSession.id;
  }
  
  // Send message via WebSocket
  websocket.send(JSON.stringify({
    type: 'message',
    query: text,
    session_id: effectiveSessionId,
    user_id: userId,
    mode: selectedMode,
    model: selectedModel,
    provider_preference: selectedProviders
  }));
};

Context Loading

When resuming a session, Aurora loads the full context:
# server/chat/backend/agent/workflow.py
if input_state.session_id and input_state.user_id:
    existing_context = LLMContextManager.load_context_history(
        input_state.session_id, 
        input_state.user_id
    )
    
    if existing_context:
        # Combine existing context with new messages
        combined_messages = []
        combined_messages.extend(existing_context)
        combined_messages.extend(input_state.messages)
        input_state.messages = combined_messages
Two-Tier Context Storage:
  1. LLM Context (llm_context_history): Complete conversation for AI (includes tool messages)
  2. UI Context (chat_sessions.messages): Formatted messages for display (excludes tool details)

Context Compression

For long conversations, Aurora uses LangChain’s context compression:
# server/chat/backend/agent/utils/chat_context_manager.py
compressed_messages, was_compressed = ChatContextManager.compress_context_if_needed(
    session_id, user_id, messages_full, model
)

if was_compressed:
    # Use compressed version for LLM context
    messages_for_context = compressed_messages
    logger.info(f"Context compression: {len(compressed_messages)} messages preserved")

User Workflows

Starting a New Chat

  1. Navigate to /chat or click “New Chat” in sidebar
  2. The UI shows the empty state with prompt suggestions
  3. Type a question or click a suggested prompt
  4. Session is created on first message send
Empty chat state with prompt suggestionsEmpty chat state with dynamic prompt suggestions based on connected providers

Chat Input Features

  • Rich text editor with markdown support
  • Multi-line input with Shift+Enter
  • Enter to send
  • Character limit: 2000 characters per message

Message Types in UI

{
  id: 1,
  role: 'user',
  content: 'List all pods in production namespace',
  timestamp: '2026-03-03T10:30:00Z'
}

Cancelling In-Progress Messages

Click the Stop button while a message is generating:
// client/src/hooks/useChatCancellation.ts
const cancelCurrentMessage = async () => {
  if (!userId || !sessionId || !webSocket.isConnected) return;
  
  // Send cancel control message
  webSocket.send(JSON.stringify({
    type: 'control',
    action: 'cancel',
    session_id: sessionId,
    user_id: userId
  }));
  
  // Backend cancels the Celery task and consolidates partial messages
};
The server:
  1. Cancels the running workflow task
  2. Waits for ongoing tool calls to complete
  3. Consolidates partial message chunks
  4. Saves the partial conversation state
  5. Adds a cancellation notice to the context
Cancelling does not undo already-executed commands. It only stops further processing.

WebSocket Message Types

Client → Server

type
string
required
Message type: init, message, control, confirmation_response
query
string
User’s question (for message type)
session_id
string
Chat session ID (UUID)
user_id
string
required
User ID
mode
string
Chat mode: ask or agent
model
string
LLM model name: claude-3-5-sonnet-20241022, gpt-4o, gemini-2.0-flash-thinking-exp-01-21, etc.
provider_preference
array
Cloud providers to scope commands: ["gcp", "aws", "azure"]
attachments
array
File attachments with filename, file_type, file_data (base64), or server_path for server-side files

Server → Client

type
string
Message type: status, message, token, tool_call, usage_info
data
object
Message payload (varies by type)
{
  "type": "status",
  "data": {
    "status": "END"
  },
  "isComplete": true,
  "session_id": "550e8400-e29b-41d4-a716-446655440000"
}

Rate Limiting

WebSocket connections are rate-limited per client:
  • Limit: 5 messages per 60 seconds
  • Scope: Per WebSocket connection ID
  • Exceeded: Returns {"type": "error", "data": {"text": "Rate limit exceeded"}}
class RateLimiter:
    def __init__(self, rate, per):
        self.tokens = defaultdict(lambda: rate)
        self.rate = rate
        self.per = per
    
    def is_allowed(self, client_id):
        now = time.time()
        elapsed = now - self.last_checked[client_id]
        self.tokens[client_id] += elapsed * (self.rate / self.per)
        
        if self.tokens[client_id] >= 1:
            self.tokens[client_id] -= 1
            return True
        return False

rate_limiter = RateLimiter(rate=5, per=60)

API Cost Tracking

Aurora tracks LLM API usage per user:
# After workflow completion
if user_id:
    asyncio.create_task(update_api_cost_cache_async(user_id))
    is_cached, total_cost = get_cached_api_cost(user_id)
    
    # Send usage info to client
    usage_info_response = {
        "type": "usage_info",
        "data": {
            "total_cost": round(total_cost, 2)
        }
    }
    await websocket.send(json.dumps(usage_info_response))
Costs are displayed in the chat UI footer in real-time.

Incident Investigation

Background RCA investigations use the same chat backend

Cloud Integrations

Execute commands across GCP, AWS, and Azure via chat

Build docs developers (and LLMs) love