Skip to main content

Overview

vLLora provides unified access to multiple AI providers through a single OpenAI-compatible API. Each provider is implemented as a Rust module with full support for streaming, tool calling, and advanced features.

Supported Providers

OpenAI

GPT-3.5, GPT-4, GPT-4o series with Azure OpenAI support

Anthropic

Claude 3 Opus, Sonnet, and Haiku models

Google Gemini

Gemini Pro and Ultra with Vertex AI integration

AWS Bedrock

Multi-model access including Claude, Llama, and Titan

Provider Architecture

All providers implement a common ModelInstance trait defined in llm/src/types/instance.rs:
#[async_trait]
pub trait ModelInstance: Send + Sync {
    async fn execute(
        &self,
        messages: Vec<Message>,
        sender: Option<Sender<ModelEvent>>,
    ) -> LLMResult<ChatCompletionMessageWithFinishReason>;

    async fn execute_stream(
        &self,
        messages: Vec<Message>,
    ) -> LLMResult<Pin<Box<dyn Stream<Item = Result<ChatCompletionChunk, ModelError>> + Send>>>;
}

OpenAI Provider

Implementation

Located in llm/src/provider/openai/mod.rs:
pub fn openai_client(
    credentials: Option<&ApiKeyCredentials>,
    endpoint: Option<&str>,
) -> Result<Client<OpenAIConfig>, ModelError> {
    let api_key = if let Some(credentials) = credentials {
        credentials.api_key.clone()
    } else {
        std::env::var("VLLORA_OPENAI_API_KEY")
            .map_err(|_| AuthorizationError::InvalidApiKey)?
    };

    let mut config = OpenAIConfig::new();
    config = config.with_api_key(api_key);

    if let Some(endpoint) = endpoint {
        config = config.with_api_base(endpoint);
    }

    Ok(Client::with_config(config))
}

Azure OpenAI Support

vLLora automatically detects and handles Azure endpoints:
pub fn is_azure_endpoint(endpoint: &str) -> bool {
    endpoint.contains("azure.com")
}

pub fn azure_openai_client(
    api_key: String,
    endpoint: &str,
    deployment_id: &str,
) -> Client<AzureConfig> {
    let azure_config = AzureConfig::new()
        .with_api_base(endpoint)
        .with_api_version("2024-10-21".to_string())
        .with_api_key(api_key)
        .with_deployment_id(deployment_id.to_string());

    Client::with_config(azure_config)
}
OpenAI provider supports all standard features: streaming, function calling, vision, and JSON mode.

Anthropic Provider

Claude Models

Implemented in llm/src/provider/anthropic.rs using the clust SDK:
pub fn anthropic_client(
    credentials: Option<&ApiKeyCredentials>,
) -> Result<clust::Client, ModelError> {
    let api_key = if let Some(credentials) = credentials {
        credentials.api_key.clone()
    } else {
        std::env::var("VLLORA_ANTHROPIC_API_KEY")
            .map_err(|_| AuthorizationError::InvalidApiKey)?
    };
    let client = Client::from_api_key(clust::ApiKey::new(api_key));
    Ok(client)
}

Message Conversion

Anthropic’s Messages API requires conversion from OpenAI format:
// System messages are extracted and sent separately
let system_prompt = messages
    .iter()
    .find(|m| matches!(m.message_type, MessageType::System))
    .map(|m| SystemPrompt::new(m.content_str()));

Tracing Integration

Every Anthropic call is traced:
use vllora_telemetry::create_model_span;

create_model_span!(
    operation_name: "anthropic_chat_completion",
    model: self.params.model_name.clone(),
    provider: "anthropic",
    // Additional attributes...
);

Google Gemini Provider

Vertex AI Integration

Implemented in llm/src/provider/gemini/:
pub struct GeminiModel {
    pub client: GeminiClient,
    pub params: GeminiModelParams,
    pub execution_options: ExecutionOptions,
    pub tools: HashMap<String, Arc<Box<dyn Tool>>>,
    pub credentials_ident: CredentialsIdent,
}

Multi-Modal Support

Gemini provider supports text and image inputs:
// Image content handling
ImageContentBlock {
    image: ImageContentSource::Base64 {
        media_type: "image/jpeg".to_string(),
        data: base64_data,
    }
}
Gemini models support both direct API access and Vertex AI endpoints for enterprise customers.

AWS Bedrock Provider

Multi-Model Support

Bedrock provides access to multiple model families:
// From llm/src/provider/bedrock/mod.rs
pub struct BedrockModel {
    pub client: Client,
    pub execution_options: ExecutionOptions,
    params: BedrockModelParams,
    pub tools: HashMap<String, Arc<Box<dyn VlloraTool>>>,
    pub model_name: String,
    pub credentials_ident: CredentialsIdent,
}

AWS Credentials

Bedrock supports multiple authentication methods:
BedrockCredentials::IAM(IAMCredentials {
    access_key_id: "...".to_string(),
    secret_access_key: "...".to_string(),
    region: Some("us-east-1".to_string()),
    session_token: None,
})

Converse API

Bedrock uses the unified Converse API:
use aws_sdk_bedrockruntime::types::{
    ContentBlock, ConversationRole, Message,
    InferenceConfiguration, ToolConfiguration
};

let response = client
    .converse()
    .model_id(model_name)
    .messages(message)
    .set_system(system_prompts)
    .set_tool_config(tool_config)
    .send()
    .await?;

Provider Selection

vLLora determines which provider to use based on:
  1. Model Name Pattern: gpt-4 → OpenAI, claude-3 → Anthropic, etc.
  2. Explicit Provider: Specified in routing configuration
  3. Endpoint URL: Azure endpoints automatically route to Azure OpenAI

Provider Enum

From llm/src/types/provider.rs:
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "lowercase")]
pub enum InferenceModelProvider {
    OpenAI,
    Anthropic,
    Gemini,
    Bedrock,
    #[serde(alias = "vertex-ai")]
    VertexAI,
    Proxy(String),
}

Credential Management

Per-Project Credentials

Each project can have its own credentials for each provider:
// From core/src/metadata/services/provider_credential.rs
pub struct ProviderCredential {
    pub id: Uuid,
    pub project_id: Uuid,
    pub provider_id: Uuid,
    pub credentials: EncryptedCredentials,
    pub created_at: NaiveDateTime,
    pub updated_at: NaiveDateTime,
}

Credential Resolution

The ProviderKeyResolver retrieves the appropriate credentials:
pub trait ProviderKeyResolver {
    fn resolve_key(
        &self,
        project_id: Uuid,
        provider: InferenceModelProvider,
    ) -> Result<Option<Credentials>, CredentialError>;
}

Model Pricing

vLLora includes pricing information for accurate cost tracking:
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct CompletionModelPrice {
    pub per_input_token: f64,
    pub per_output_token: f64,
    pub per_cached_input_token: Option<f64>,
    pub per_cached_input_write_token: Option<f64>,
    pub valid_from: Option<NaiveDate>,
}
Pricing data is embedded in gateway/models_data.json for fast startup and offline operation.

Adding Custom Providers

vLLora supports custom provider proxies:
InferenceModelProvider::Proxy("my-custom-provider".to_string())
Custom providers can implement OpenAI-compatible endpoints and will be routed through the proxy system.

Build docs developers (and LLMs) love