Skip to main content

Introduction

Aurora is a cloud operations platform built on a microservices architecture with containerized components orchestrated via Docker Compose. The system provides AI-powered infrastructure management, incident response, and root cause analysis across multiple cloud providers.

Core Components

Aurora consists of 8 primary services working together:
┌─────────────────────────────────────────────────────────────────┐
│                         Frontend (Next.js)                      │
│                        Port 3000 (HTTP)                         │
└────────────────┬────────────────────────────────────────────────┘

                 ├──────────────────┬─────────────────┬────────────
                 │                  │                 │
                 ▼                  ▼                 ▼
     ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
     │  Flask REST API │  │ WebSocket Chat  │  │ Celery Workers  │
     │  Port 5080      │  │  Port 5006      │  │  (Background)   │
     └────────┬────────┘  └────────┬────────┘  └────────┬────────┘
              │                    │                     │
              └────────────────────┴─────────────────────┘

              ┌────────────────────┼────────────────────┐
              │                    │                    │
              ▼                    ▼                    ▼
     ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
     │  PostgreSQL  │    │   Weaviate   │    │    Redis     │
     │  Port 5432   │    │  Port 8080   │    │  Port 6379   │
     └──────────────┘    └──────────────┘    └──────────────┘

Service Responsibilities

ServicePortPurpose
Frontend3000Next.js 15 UI with Auth.js authentication
aurora-server5080Flask REST API for cloud provider operations
chatbot5006WebSocket server for AI agent interactions
celery_worker-Background task processing (post-auth, discovery)
celery_beat-Periodic task scheduler (cleanup, discovery)
postgres5432Primary data store (users, sessions, incidents)
weaviate8080Vector database for semantic search
redis6379Message broker and cache
vault8200HashiCorp Vault for secrets management
seaweedfs8333S3-compatible object storage
memgraph7687Graph database for service discovery

Technology Stack

Backend

  • Python 3.11+ with Flask and asyncio
  • LangGraph for AI agent orchestration
  • LangChain for LLM integration
  • psycopg2 for PostgreSQL connections
  • Celery for distributed task processing
  • Terraform for infrastructure provisioning

Frontend

  • Next.js 15 with App Router
  • TypeScript with strict mode
  • Tailwind CSS + shadcn/ui components
  • Auth.js for authentication
  • WebSocket for real-time chat

Infrastructure

  • Docker Compose for local/dev deployment
  • Kubernetes for production deployment
  • HashiCorp Vault for secrets
  • SeaweedFS for object storage (Apache 2.0)
  • Memgraph for graph-based service discovery

Communication Patterns

REST API Communication

Frontend → Flask API (HTTP)
  ├─ Cloud provider operations (GCP, AWS, Azure)
  ├─ User authentication & preferences
  ├─ Incident management
  └─ Integration management (GitHub, Slack, etc.)

WebSocket Communication

Frontend ↔ Chatbot (WebSocket)
  ├─ Real-time AI agent responses
  ├─ Tool execution status
  ├─ Token streaming (LLM responses)
  └─ Infrastructure deployment confirmations

Background Processing

Flask API → Redis → Celery Workers
  ├─ Post-OAuth setup tasks
  ├─ Service discovery scans
  ├─ Background chat analysis
  └─ Periodic cleanup tasks

Data Flow

  1. User Authentication: Frontend → Auth.js → Flask API → PostgreSQL
  2. Cloud Operations: Frontend → Flask API → Cloud Provider APIs
  3. AI Chat: Frontend → WebSocket → LangGraph Agent → LLM Providers
  4. Knowledge Search: Agent → Weaviate (semantic) + PostgreSQL (structured)
  5. Secrets Access: Backend → Vault → Cloud Provider APIs
  6. File Storage: Backend → SeaweedFS (S3 API) for uploads/artifacts

Deployment Architecture

Development Mode

make dev  # Start all containers with hot-reload

Production Mode

make prod-prebuilt  # Pull from GHCR
make prod-local     # Build from source

Key Design Patterns

Stateless Authentication

User authentication uses stateless tokens (X-User-ID header) rather than sessions, enabling horizontal scaling.

Event-Driven Background Processing

Celery workers handle long-running tasks asynchronously, keeping the API responsive.

Agent Workflow Isolation

Each chat session gets isolated Terraform directories and WebSocket connections, preventing cross-user contamination.

Pluggable Storage

S3-compatible storage abstraction supports SeaweedFS (default), AWS S3, Cloudflare R2, and more.

Security Architecture

  • Secrets Management: All credentials stored in Vault, referenced as vault:kv/data/aurora/users/{secret_name}
  • Rate Limiting: Flask-Limiter protects API endpoints
  • CORS: Strict origin validation for frontend requests
  • Authentication: Auth.js with OAuth 2.0 for cloud providers
  • Network Isolation: Services communicate within Docker network

Monitoring & Observability

  • Logging: Structured logs to stdout (container-native)
  • Health Checks: Docker healthcheck for all services
  • Metrics: Service discovery tracks resource health
  • Incident Tracking: PostgreSQL stores incident timeline

Build docs developers (and LLMs) love