Service Components

Flask REST API (aurora-server)

Overview

The Flask API is the primary backend service handling HTTP requests for cloud operations, user management, and integrations. Container: aurora-server
Port: 5080 (configurable via FLASK_PORT)
Entry Point: server/main_compute.py:431
Process: Gunicorn (production) or Flask dev server

Key Responsibilities

Cloud Provider Integration: GCP, AWS, Azure, OVH, Scaleway, Tailscale
OAuth Flows: GitHub, Bitbucket, Slack, PagerDuty, Confluence
Incident Management: Create, list, update incidents
User Preferences: Store/retrieve user settings
Health Checks: /health endpoint for monitoring
Knowledge Base: Document upload and management

Blueprint Structure

The Flask app uses modular blueprints organized by domain:

# Core services
app.register_blueprint(llm_config_bp)      # LLM provider config
app.register_blueprint(auth_bp)            # Auth.js routes
app.register_blueprint(health_bp)          # Health checks

# Cloud providers
app.register_blueprint(gcp_auth_bp)        # GCP OAuth & operations
app.register_blueprint(aws_bp)             # AWS operations
app.register_blueprint(azure_bp)           # Azure operations

# Integrations
app.register_blueprint(github_bp)          # GitHub integration
app.register_blueprint(slack_bp)           # Slack integration
app.register_blueprint(grafana_bp)         # Grafana monitoring
app.register_blueprint(datadog_bp)         # Datadog integration

Reference: server/main_compute.py:195-396

Environment Variables

# Core
FLASK_PORT=5080
FLASK_SECRET_KEY=<random-secret>
FRONTEND_URL=http://localhost:3000

# Database
POSTGRES_HOST=postgres
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=<password>

# Redis
REDIS_URL=redis://redis:6379/0

# Vault
VAULT_ADDR=http://vault:8200
VAULT_TOKEN=<root-token>

Dependencies

postgres (healthy) - Database connection
weaviate (healthy) - Vector search
redis (running) - Cache and queue
vault (healthy) - Secrets access
seaweedfs-filer (healthy) - Object storage

WebSocket Chatbot (chatbot)

Overview

WebSocket server powering the AI agent interactions with real-time streaming. Container: chatbot
Port: 5006
Entry Point: server/main_chatbot.py:604
Process: Python WebSocket server with asyncio

Key Responsibilities

LangGraph Workflow Execution: Run AI agent with tool calls
Token Streaming: Stream LLM responses token-by-token
Tool Execution: Execute cloud operations via agent tools
Session Management: Track chat sessions per user
Context Loading: Load historical messages for continuity
WebSocket Confirmations: Interactive approval for destructive operations

Message Flow

Client → Server (WebSocket)
├─ init: Initialize connection with user_id
├─ query: User question with session_id, mode, model
├─ confirmation_response: Approve/reject infrastructure changes
└─ control: Cancel workflow execution

Server → Client (WebSocket)
├─ status: START/END workflow state
├─ message: Streaming text chunks from LLM
├─ tool_call: Tool execution with name, input, status
├─ tool_result: Tool completion with result
├─ usage_info: API cost tracking
└─ error: Error messages

Reference: server/main_chatbot.py:217-602

Key Features

Real-Time Streaming

# Token-by-token streaming from LLM
async for event_type, event_data in wf.stream(state):
    if event_type == "token":
        await websocket.send(json.dumps({
            "type": "message",
            "data": {"text": token_text, "is_chunk": True}
        }))

Rate Limiting

rate_limiter = RateLimiter(rate=5, per=60)  # 5 requests per minute

Session Isolation

Each session gets dedicated Terraform directories:

terraform_dir = f"/app/terraform_workdir/{user_id}/{session_id}"

Celery Workers (celery_worker)

Overview

Background task processing for long-running operations. Container: celery_worker
Process: celery -A celery_config worker --loglevel=info
Configuration: server/celery_config.py

Registered Tasks

include=[
    'connectors.gcp_connector.gcp_post_auth_tasks',  # GCP setup
    'routes.gcp.root_project_tasks',                  # GCP root project
    'routes.grafana.tasks',                           # Grafana sync
    'routes.datadog.tasks',                           # Datadog sync
    'routes.jenkins.tasks',                           # Jenkins integration
    'chat.background.task',                           # Background chats
    'chat.background.summarization',                  # Chat summaries
    'services.discovery.tasks',                       # Service discovery
]

Reference: server/celery_config.py:46-64

Periodic Tasks (Celery Beat)

Task	Schedule	Purpose
`cleanup-idle-terminal-pods`	10 min	Remove inactive kubectl pods
`cleanup-stale-background-chats`	5 min	Cancel timed-out background sessions
`cleanup-stale-kb-documents`	3 min	Remove failed knowledge base uploads
`run-full-discovery`	1 hour	Scan cloud resources for service graph
`mark-stale-services`	24 hours	Mark inactive services as stale

Reference: server/celery_config.py:66-87

Configuration

celery_app.conf.update(
    task_serializer='json',
    result_serializer='json',
    task_track_started=True,
    task_time_limit=(60*60*3),              # 3 hour timeout
    worker_max_tasks_per_child=1,           # Restart after each task
    worker_prefetch_multiplier=1,           # One task at a time
)

PostgreSQL Database

Overview

Primary relational database for structured data. Container: aurora-postgres
Port: 5432
Image: postgres:15-alpine
Database: aurora_db

Key Tables

users: User accounts and preferences
chat_sessions: Chat history and context
incidents: Incident tracking and timeline
incident_thoughts: RCA background analysis thoughts
credentials: Cloud provider OAuth tokens (references to Vault)
graph_services: Discovered services for graph visualization
llm_usage: LLM API cost tracking

Connection Management

Aurora uses a connection pool for efficient database access:

from utils.db.connection_pool import db_pool

# Admin operations
with db_pool.get_admin_connection() as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users")

# User-scoped operations
with db_pool.get_user_connection() as conn:
    cursor.execute("SET myapp.current_user_id = %s;", (user_id,))

Weaviate Vector Database

Overview

Vector database for semantic search over knowledge base documents. Container: weaviate
Port: 8080 (HTTP), 50051 (gRPC)
Image: cr.weaviate.io/semitechnologies/weaviate:1.27.6
Module: text2vec-transformers with all-MiniLM-L6-v2

Usage

from chat.backend.agent.weaviate_client import WeaviateClient

weaviate = WeaviateClient(postgres_client)
results = weaviate.search(
    query="How do I configure Kubernetes?",
    collection_name="UserKnowledge",
    limit=5
)

Data Stored

Knowledge Base Documents: User-uploaded documentation
GitHub Files: Code and documentation from connected repos
Confluence Pages: Synced wiki content

Redis Cache & Queue

Overview

In-memory data store for caching and message brokering. Container: redis
Port: 6379
Image: redis:7-alpine

Use Cases

Celery Broker: Task queue for background jobs
Celery Backend: Store task results
API Cost Cache: Cache LLM usage for performance
Cloud Setup Cache: Cache cloud provider resource lists
Rate Limiting: Track API request counts

Cache Keys

# API cost tracking
f"api_cost:{user_id}"

# Cloud resource cache
f"gcp_projects:{user_id}"
f"aws_instances:{user_id}:{region}"

HashiCorp Vault

Overview

Secrets management for sensitive credentials. Container: aurora-vault
Port: 8200
Image: hashicorp/vault:1.15
Storage: File-based with persistent volumes

Configuration

# config/vault/vault.hcl
storage "file" {
  path = "/vault/data"
}

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_disable = true
}

Secret Storage Pattern

# Store credential in Vault
vault_path = f"aurora/users/{user_id}/gcp_token"
vault.secrets.kv.v2.create_or_update_secret(
    path=vault_path,
    secret=dict(access_token="...", refresh_token="...")
)

# Reference in database
db.execute(
    "UPDATE credentials SET token = %s WHERE user_id = %s",
    (f"vault:kv/data/aurora/users/{user_id}/gcp_token", user_id)
)

Reference: server/utils/vault/vault_client.py

SeaweedFS Object Storage

Overview

S3-compatible object storage for files and artifacts. Containers: seaweedfs-master, seaweedfs-volume, seaweedfs-filer
Ports: 8333 (S3 API), 8888 (Web UI), 9333 (Master)
Image: chrislusf/seaweedfs:4.07
License: Apache 2.0

S3 API Access

from utils.storage.storage import get_storage_manager

storage = get_storage_manager(user_id)

# Upload file
storage.upload_file(user_id, local_path, "session123/file.txt")

# Download file
content = storage.download_file(user_id, "session123/file.txt")

# List files
files = storage.list_user_files(user_id, prefix="session123/")

Stored Data

Terraform State: Session-isolated infrastructure state
Knowledge Base Files: Uploaded PDFs, docs
GitHub Archives: Cloned repository content
Log Files: Captured command outputs

Memgraph Graph Database

Overview

In-memory graph database for service discovery and relationships. Container: aurora-memgraph
Port: 7687 (Bolt), 7444 (HTTP)
Image: memgraph/memgraph-mage:latest
UI: memgraph-lab on port 3001

Data Model

// Service nodes
CREATE (s:Service {
    id: "service-123",
    name: "frontend-api",
    type: "kubernetes_service",
    provider: "gcp"
})

// Relationships
CREATE (a:Service)-[:DEPENDS_ON]->(b:Service)
CREATE (a:Service)-[:DEPLOYED_IN]->(c:Cluster)

Discovery Tasks

Service discovery runs periodically via Celery Beat:

@celery_app.task
def run_full_discovery():
    """Discover services across all connected cloud providers."""
    # Scan GCP projects, AWS accounts, Azure subscriptions
    # Store discovered services in Memgraph

Reference: services/discovery/tasks.py

Frontend (Next.js)

Overview

React-based frontend with Server-Side Rendering. Container: frontend
Port: 3000
Framework: Next.js 15 with App Router
Entry Point: client/src/app/page.tsx

Technology Stack

React 18: Functional components with hooks
TypeScript: Strict mode enabled
Tailwind CSS: Utility-first styling
shadcn/ui: Component library
Auth.js: Authentication provider
WebSocket: Real-time chat connection

Environment Variables

# Public (injected at runtime)
NEXT_PUBLIC_BACKEND_URL=http://localhost:5080
NEXT_PUBLIC_WEBSOCKET_URL=ws://localhost:5006

# Server-side only
BACKEND_URL=http://aurora-server:5080
AUTH_SECRET=<random-secret>

Build Modes

# Development (with hot reload)
FROM node:20-alpine AS dev
CMD ["npm", "run", "dev"]

# Production (optimized)
FROM node:20-alpine AS prod
CMD ["npm", "run", "start"]

Get Started

Core Features

Architecture

Deployment

Configuration

Integrations

Cloud Providers

Observability

Development

Guides

Reference

Help

​Flask REST API (aurora-server)

​Overview

​Key Responsibilities

​Blueprint Structure

​Environment Variables

​Dependencies

​WebSocket Chatbot (chatbot)

​Overview

​Key Responsibilities

​Message Flow

​Key Features

​Real-Time Streaming

​Rate Limiting

​Session Isolation

​Celery Workers (celery_worker)

​Overview

​Registered Tasks

​Periodic Tasks (Celery Beat)

​Configuration

​PostgreSQL Database

​Overview

​Key Tables

​Connection Management

​Weaviate Vector Database

​Overview

​Usage

​Data Stored

​Redis Cache & Queue

​Overview

​Use Cases

​Cache Keys

​HashiCorp Vault

​Overview

​Configuration

​Secret Storage Pattern

​SeaweedFS Object Storage

​Overview

​S3 API Access

​Stored Data

​Memgraph Graph Database

​Overview

​Data Model

​Discovery Tasks

​Frontend (Next.js)

​Overview

​Technology Stack

​Environment Variables

​Build Modes

​Related Documentation

Build docs developers (and LLMs) love

Flask REST API (aurora-server)

Overview

Key Responsibilities

Blueprint Structure

Environment Variables

Dependencies

WebSocket Chatbot (chatbot)

Overview

Key Responsibilities

Message Flow

Key Features

Real-Time Streaming

Rate Limiting

Session Isolation

Celery Workers (celery_worker)

Overview

Registered Tasks

Periodic Tasks (Celery Beat)

Configuration

PostgreSQL Database

Overview

Key Tables

Connection Management

Weaviate Vector Database

Overview

Usage

Data Stored

Redis Cache & Queue

Overview

Use Cases

Cache Keys

HashiCorp Vault

Overview

Configuration

Secret Storage Pattern

SeaweedFS Object Storage

Overview

S3 API Access

Stored Data

Memgraph Graph Database

Overview

Data Model

Discovery Tasks

Frontend (Next.js)

Overview

Technology Stack

Environment Variables

Build Modes

Related Documentation