Skip to main content

Flask REST API (aurora-server)

Overview

The Flask API is the primary backend service handling HTTP requests for cloud operations, user management, and integrations. Container: aurora-server
Port: 5080 (configurable via FLASK_PORT)
Entry Point: server/main_compute.py:431
Process: Gunicorn (production) or Flask dev server

Key Responsibilities

  • Cloud Provider Integration: GCP, AWS, Azure, OVH, Scaleway, Tailscale
  • OAuth Flows: GitHub, Bitbucket, Slack, PagerDuty, Confluence
  • Incident Management: Create, list, update incidents
  • User Preferences: Store/retrieve user settings
  • Health Checks: /health endpoint for monitoring
  • Knowledge Base: Document upload and management

Blueprint Structure

The Flask app uses modular blueprints organized by domain:
# Core services
app.register_blueprint(llm_config_bp)      # LLM provider config
app.register_blueprint(auth_bp)            # Auth.js routes
app.register_blueprint(health_bp)          # Health checks

# Cloud providers
app.register_blueprint(gcp_auth_bp)        # GCP OAuth & operations
app.register_blueprint(aws_bp)             # AWS operations
app.register_blueprint(azure_bp)           # Azure operations

# Integrations
app.register_blueprint(github_bp)          # GitHub integration
app.register_blueprint(slack_bp)           # Slack integration
app.register_blueprint(grafana_bp)         # Grafana monitoring
app.register_blueprint(datadog_bp)         # Datadog integration
Reference: server/main_compute.py:195-396

Environment Variables

# Core
FLASK_PORT=5080
FLASK_SECRET_KEY=<random-secret>
FRONTEND_URL=http://localhost:3000

# Database
POSTGRES_HOST=postgres
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=<password>

# Redis
REDIS_URL=redis://redis:6379/0

# Vault
VAULT_ADDR=http://vault:8200
VAULT_TOKEN=<root-token>

Dependencies

  • postgres (healthy) - Database connection
  • weaviate (healthy) - Vector search
  • redis (running) - Cache and queue
  • vault (healthy) - Secrets access
  • seaweedfs-filer (healthy) - Object storage

WebSocket Chatbot (chatbot)

Overview

WebSocket server powering the AI agent interactions with real-time streaming. Container: chatbot
Port: 5006
Entry Point: server/main_chatbot.py:604
Process: Python WebSocket server with asyncio

Key Responsibilities

  • LangGraph Workflow Execution: Run AI agent with tool calls
  • Token Streaming: Stream LLM responses token-by-token
  • Tool Execution: Execute cloud operations via agent tools
  • Session Management: Track chat sessions per user
  • Context Loading: Load historical messages for continuity
  • WebSocket Confirmations: Interactive approval for destructive operations

Message Flow

Client → Server (WebSocket)
├─ init: Initialize connection with user_id
├─ query: User question with session_id, mode, model
├─ confirmation_response: Approve/reject infrastructure changes
└─ control: Cancel workflow execution

Server → Client (WebSocket)
├─ status: START/END workflow state
├─ message: Streaming text chunks from LLM
├─ tool_call: Tool execution with name, input, status
├─ tool_result: Tool completion with result
├─ usage_info: API cost tracking
└─ error: Error messages
Reference: server/main_chatbot.py:217-602

Key Features

Real-Time Streaming

# Token-by-token streaming from LLM
async for event_type, event_data in wf.stream(state):
    if event_type == "token":
        await websocket.send(json.dumps({
            "type": "message",
            "data": {"text": token_text, "is_chunk": True}
        }))

Rate Limiting

rate_limiter = RateLimiter(rate=5, per=60)  # 5 requests per minute

Session Isolation

Each session gets dedicated Terraform directories:
terraform_dir = f"/app/terraform_workdir/{user_id}/{session_id}"

Celery Workers (celery_worker)

Overview

Background task processing for long-running operations. Container: celery_worker
Process: celery -A celery_config worker --loglevel=info
Configuration: server/celery_config.py

Registered Tasks

include=[
    'connectors.gcp_connector.gcp_post_auth_tasks',  # GCP setup
    'routes.gcp.root_project_tasks',                  # GCP root project
    'routes.grafana.tasks',                           # Grafana sync
    'routes.datadog.tasks',                           # Datadog sync
    'routes.jenkins.tasks',                           # Jenkins integration
    'chat.background.task',                           # Background chats
    'chat.background.summarization',                  # Chat summaries
    'services.discovery.tasks',                       # Service discovery
]
Reference: server/celery_config.py:46-64

Periodic Tasks (Celery Beat)

TaskSchedulePurpose
cleanup-idle-terminal-pods10 minRemove inactive kubectl pods
cleanup-stale-background-chats5 minCancel timed-out background sessions
cleanup-stale-kb-documents3 minRemove failed knowledge base uploads
run-full-discovery1 hourScan cloud resources for service graph
mark-stale-services24 hoursMark inactive services as stale
Reference: server/celery_config.py:66-87

Configuration

celery_app.conf.update(
    task_serializer='json',
    result_serializer='json',
    task_track_started=True,
    task_time_limit=(60*60*3),              # 3 hour timeout
    worker_max_tasks_per_child=1,           # Restart after each task
    worker_prefetch_multiplier=1,           # One task at a time
)

PostgreSQL Database

Overview

Primary relational database for structured data. Container: aurora-postgres
Port: 5432
Image: postgres:15-alpine
Database: aurora_db

Key Tables

  • users: User accounts and preferences
  • chat_sessions: Chat history and context
  • incidents: Incident tracking and timeline
  • incident_thoughts: RCA background analysis thoughts
  • credentials: Cloud provider OAuth tokens (references to Vault)
  • graph_services: Discovered services for graph visualization
  • llm_usage: LLM API cost tracking

Connection Management

Aurora uses a connection pool for efficient database access:
from utils.db.connection_pool import db_pool

# Admin operations
with db_pool.get_admin_connection() as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users")

# User-scoped operations
with db_pool.get_user_connection() as conn:
    cursor.execute("SET myapp.current_user_id = %s;", (user_id,))

Weaviate Vector Database

Overview

Vector database for semantic search over knowledge base documents. Container: weaviate
Port: 8080 (HTTP), 50051 (gRPC)
Image: cr.weaviate.io/semitechnologies/weaviate:1.27.6
Module: text2vec-transformers with all-MiniLM-L6-v2

Usage

from chat.backend.agent.weaviate_client import WeaviateClient

weaviate = WeaviateClient(postgres_client)
results = weaviate.search(
    query="How do I configure Kubernetes?",
    collection_name="UserKnowledge",
    limit=5
)

Data Stored

  • Knowledge Base Documents: User-uploaded documentation
  • GitHub Files: Code and documentation from connected repos
  • Confluence Pages: Synced wiki content

Redis Cache & Queue

Overview

In-memory data store for caching and message brokering. Container: redis
Port: 6379
Image: redis:7-alpine

Use Cases

  1. Celery Broker: Task queue for background jobs
  2. Celery Backend: Store task results
  3. API Cost Cache: Cache LLM usage for performance
  4. Cloud Setup Cache: Cache cloud provider resource lists
  5. Rate Limiting: Track API request counts

Cache Keys

# API cost tracking
f"api_cost:{user_id}"

# Cloud resource cache
f"gcp_projects:{user_id}"
f"aws_instances:{user_id}:{region}"

HashiCorp Vault

Overview

Secrets management for sensitive credentials. Container: aurora-vault
Port: 8200
Image: hashicorp/vault:1.15
Storage: File-based with persistent volumes

Configuration

# config/vault/vault.hcl
storage "file" {
  path = "/vault/data"
}

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_disable = true
}

Secret Storage Pattern

# Store credential in Vault
vault_path = f"aurora/users/{user_id}/gcp_token"
vault.secrets.kv.v2.create_or_update_secret(
    path=vault_path,
    secret=dict(access_token="...", refresh_token="...")
)

# Reference in database
db.execute(
    "UPDATE credentials SET token = %s WHERE user_id = %s",
    (f"vault:kv/data/aurora/users/{user_id}/gcp_token", user_id)
)
Reference: server/utils/vault/vault_client.py

SeaweedFS Object Storage

Overview

S3-compatible object storage for files and artifacts. Containers: seaweedfs-master, seaweedfs-volume, seaweedfs-filer
Ports: 8333 (S3 API), 8888 (Web UI), 9333 (Master)
Image: chrislusf/seaweedfs:4.07
License: Apache 2.0

S3 API Access

from utils.storage.storage import get_storage_manager

storage = get_storage_manager(user_id)

# Upload file
storage.upload_file(user_id, local_path, "session123/file.txt")

# Download file
content = storage.download_file(user_id, "session123/file.txt")

# List files
files = storage.list_user_files(user_id, prefix="session123/")

Stored Data

  • Terraform State: Session-isolated infrastructure state
  • Knowledge Base Files: Uploaded PDFs, docs
  • GitHub Archives: Cloned repository content
  • Log Files: Captured command outputs

Memgraph Graph Database

Overview

In-memory graph database for service discovery and relationships. Container: aurora-memgraph
Port: 7687 (Bolt), 7444 (HTTP)
Image: memgraph/memgraph-mage:latest
UI: memgraph-lab on port 3001

Data Model

// Service nodes
CREATE (s:Service {
    id: "service-123",
    name: "frontend-api",
    type: "kubernetes_service",
    provider: "gcp"
})

// Relationships
CREATE (a:Service)-[:DEPENDS_ON]->(b:Service)
CREATE (a:Service)-[:DEPLOYED_IN]->(c:Cluster)

Discovery Tasks

Service discovery runs periodically via Celery Beat:
@celery_app.task
def run_full_discovery():
    """Discover services across all connected cloud providers."""
    # Scan GCP projects, AWS accounts, Azure subscriptions
    # Store discovered services in Memgraph
Reference: services/discovery/tasks.py

Frontend (Next.js)

Overview

React-based frontend with Server-Side Rendering. Container: frontend
Port: 3000
Framework: Next.js 15 with App Router
Entry Point: client/src/app/page.tsx

Technology Stack

  • React 18: Functional components with hooks
  • TypeScript: Strict mode enabled
  • Tailwind CSS: Utility-first styling
  • shadcn/ui: Component library
  • Auth.js: Authentication provider
  • WebSocket: Real-time chat connection

Environment Variables

# Public (injected at runtime)
NEXT_PUBLIC_BACKEND_URL=http://localhost:5080
NEXT_PUBLIC_WEBSOCKET_URL=ws://localhost:5006

# Server-side only
BACKEND_URL=http://aurora-server:5080
AUTH_SECRET=<random-secret>

Build Modes

# Development (with hot reload)
FROM node:20-alpine AS dev
CMD ["npm", "run", "dev"]

# Production (optimized)
FROM node:20-alpine AS prod
CMD ["npm", "run", "start"]

Build docs developers (and LLMs) love