Overview
The vLLora gateway is a lightweight, high-performance AI proxy built with Rust that sits between your applications and AI providers. It provides unified access to multiple LLM providers through an OpenAI-compatible API while adding real-time tracing, routing capabilities, and cost optimization.Architecture Components
Core Services
The gateway is organized into several key components:HTTP Server
Actix-based web server handling REST API requests on port 9090
UI Server
Web interface for configuration and monitoring on port 9091
OTEL gRPC Collector
OpenTelemetry collector for traces and metrics ingestion
MCP Server
Model Context Protocol server for advanced integrations
Request Flow
Middleware Processing
Request flows through middleware layers:
- ProjectMiddleware: Resolves the project context
- TracingContext: Initializes OpenTelemetry spans
- TraceLogger: Logs request details
- RunId/ThreadId: Assigns unique identifiers
Routing Decision
The router evaluates the request based on configured strategy:
- Fallback routing
- Percentage-based A/B testing
- Metric-optimized selection
- Conditional routing
Provider Execution
The selected provider client executes the request with proper credential management
Key Features
Unified Provider Interface
The gateway abstracts provider-specific implementations:- OpenAI: Standard OpenAI API and Azure OpenAI endpoints
- Anthropic: Claude models via the Messages API
- Google Gemini: Gemini models with Vertex AI support
- AWS Bedrock: Multi-model access through Bedrock Runtime
Routing Strategies
The gateway supports multiple routing strategies defined incore/src/routing/mod.rs:
Credential Management
The gateway includes a secure credential storage system:- KeyStorage: Stores encrypted API keys in SQLite
- ProviderKeyResolver: Resolves credentials per project and provider
- Multiple Auth Methods: API keys, AWS IAM, service accounts
Database Layer
The gateway uses SQLite for metadata storage with Diesel ORM:Database Schema
Database Schema
- projects: Project configurations and settings
- providers: Provider definitions and endpoints
- provider_credentials: Encrypted API keys
- models: Available model definitions
- traces: OpenTelemetry trace data
- runs: Execution run metadata
- threads: Conversation thread tracking
core/src/metadata/utils.rs with automatic migrations.
Telemetry Integration
The gateway initializes OpenTelemetry tracing ingateway/src/tracing.rs:
- Span Exporters: Export traces to OTLP endpoints and SQLite
- Baggage Processor: Propagates context (run_id, thread_id, project_id)
- Meter Provider: Collects and exports metrics
- UUID ID Generator: Generates trace IDs as UUIDs for easier querying
The gateway automatically captures all LLM interactions as distributed traces without requiring code changes in your application.
Configuration
The gateway accepts configuration via:- Environment Variables:
RUST_LOG, provider API keys - CLI Arguments: Port numbers, database path
- Database Settings: Per-project configuration
Performance Characteristics
- Startup Time: Fast startup with embedded models data JSON
- Concurrency: Async/await throughout, handles thousands of concurrent requests
- Memory: Efficient buffering with configurable flush intervals
- Latency Overhead: Minimal (< 10ms) for request routing