Gateway Architecture

Overview

The vLLora gateway is a lightweight, high-performance AI proxy built with Rust that sits between your applications and AI providers. It provides unified access to multiple LLM providers through an OpenAI-compatible API while adding real-time tracing, routing capabilities, and cost optimization.

Architecture Components

Core Services

The gateway is organized into several key components:

HTTP Server

Actix-based web server handling REST API requests on port 9090

UI Server

Web interface for configuration and monitoring on port 9091

OTEL gRPC Collector

OpenTelemetry collector for traces and metrics ingestion

MCP Server

Model Context Protocol server for advanced integrations

Request Flow

Request Reception

The HTTP server receives an OpenAI-compatible request at /v1/chat/completions

Middleware Processing

Request flows through middleware layers:

ProjectMiddleware: Resolves the project context
TracingContext: Initializes OpenTelemetry spans
TraceLogger: Logs request details
RunId/ThreadId: Assigns unique identifiers

Routing Decision

The router evaluates the request based on configured strategy:

Fallback routing
Percentage-based A/B testing
Metric-optimized selection
Conditional routing

Provider Execution

The selected provider client executes the request with proper credential management

Response Streaming

Responses are streamed back to the client while simultaneously captured for tracing

Key Features

Unified Provider Interface

The gateway abstracts provider-specific implementations:

OpenAI: Standard OpenAI API and Azure OpenAI endpoints
Anthropic: Claude models via the Messages API
Google Gemini: Gemini models with Vertex AI support
AWS Bedrock: Multi-model access through Bedrock Runtime

Routing Strategies

The gateway supports multiple routing strategies defined in core/src/routing/mod.rs:

RoutingStrategy::Fallback
// Routes to the first available target, falling back to next on failure

Credential Management

The gateway includes a secure credential storage system:

KeyStorage: Stores encrypted API keys in SQLite
ProviderKeyResolver: Resolves credentials per project and provider
Multiple Auth Methods: API keys, AWS IAM, service accounts

Database Layer

The gateway uses SQLite for metadata storage with Diesel ORM:

Database Schema

projects: Project configurations and settings
providers: Provider definitions and endpoints
provider_credentials: Encrypted API keys
models: Available model definitions
traces: OpenTelemetry trace data
runs: Execution run metadata
threads: Conversation thread tracking

Database initialization happens in core/src/metadata/utils.rs with automatic migrations.

Telemetry Integration

The gateway initializes OpenTelemetry tracing in gateway/src/tracing.rs:

pub fn init_tracing(
    project_trace_senders: Arc<ProjectTraceMap>,
    run_span_buffer: Arc<RunSpanBuffer>,
    db_pool: Option<DbPool>,
)

This sets up:

Span Exporters: Export traces to OTLP endpoints and SQLite
Baggage Processor: Propagates context (run_id, thread_id, project_id)
Meter Provider: Collects and exports metrics
UUID ID Generator: Generates trace IDs as UUIDs for easier querying

The gateway automatically captures all LLM interactions as distributed traces without requiring code changes in your application.

Configuration

The gateway accepts configuration via:

Environment Variables: RUST_LOG, provider API keys
CLI Arguments: Port numbers, database path
Database Settings: Per-project configuration

Performance Characteristics

Startup Time: Fast startup with embedded models data JSON
Concurrency: Async/await throughout, handles thousands of concurrent requests
Memory: Efficient buffering with configurable flush intervals
Latency Overhead: Minimal (< 10ms) for request routing

The gateway runs entirely locally with no data sent to external services except your configured AI providers.

Get Started

Core Concepts

Features

Guides

Gateway Architecture

Overview

Architecture Components

Core Services

HTTP Server

UI Server

OTEL gRPC Collector

MCP Server

Request Flow

Key Features

Unified Provider Interface

Routing Strategies

Credential Management

Database Layer

Telemetry Integration

Configuration

Performance Characteristics

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Guides

​Overview

​Architecture Components

​Core Services

HTTP Server

UI Server

OTEL gRPC Collector

MCP Server

​Request Flow

​Key Features

​Unified Provider Interface

​Routing Strategies

​Credential Management

​Database Layer

​Telemetry Integration

​Configuration

​Performance Characteristics

Build docs developers (and LLMs) love

Overview

Architecture Components

Core Services

Request Flow

Key Features

Unified Provider Interface

Routing Strategies

Credential Management

Database Layer

Telemetry Integration

Configuration

Performance Characteristics