Skip to main content

Overview

LLM Gateway Core’s extensible architecture makes it easy to add support for any LLM provider. Whether you’re integrating:
  • A commercial API (OpenAI, Anthropic, Cohere)
  • A self-hosted model server
  • An internal ML platform
  • A custom inference engine
This guide will walk you through the complete process.

Architecture Review

Every provider must:
  1. Inherit from LLMProvider abstract base class
  2. Implement the chat() async method
  3. Define the name property
  4. Return a standardized ChatResponse
  5. Register with the Router

Step-by-Step Implementation

1

Create Provider Class

Create a new file in app/providers/ for your provider:
app/providers/custom.py
from app.providers.base import LLMProvider
from app.api.v1.schemas import ChatRequest, ChatResponse, Usage
import uuid

class CustomProvider(LLMProvider):
    @property
    def name(self) -> str:
        return "custom"
    
    async def chat(self, request: ChatRequest) -> ChatResponse:
        # Implementation goes here
        pass
2

Implement chat() Method

Add your provider’s logic to handle chat requests:
async def chat(self, request: ChatRequest) -> ChatResponse:
    # 1. Extract request parameters
    messages = request.messages
    model = request.model or "default-model"
    temperature = request.temperature
    max_tokens = request.max_tokens
    
    # 2. Call your provider's API
    # (implementation depends on your provider)
    
    # 3. Return standardized response
    return ChatResponse(
        id=str(uuid.uuid4()),
        provider=self.name,
        content="Generated response text",
        usage=Usage(
            prompt_tokens=10,
            completion_tokens=20,
            total_tokens=30
        )
    )
3

Add Configuration

Add any required settings to app/core/config.py:
app/core/config.py
class Settings(BaseSettings):
    # Existing settings...
    
    # Your provider's configuration
    CUSTOM_API_KEY: str = ""
    CUSTOM_API_BASE_URL: str = "https://api.example.com"
    CUSTOM_DEFAULT_MODEL: str = "model-v1"
4

Register Provider

Add your provider to the router in app/core/router.py:
app/core/router.py
from app.providers.custom import CustomProvider

class Router:
    def __init__(self):
        self.providers = {
            "gemini": GeminiProvider(),
            "ollama": OllamaProvider(),
            "custom": CustomProvider(),  # Add your provider
        }
    
    def route(self, request) -> List[LLMProvider]:
        target = request.model or request.model_hint
        
        # Add routing logic for your provider
        if target == "custom":
            return [self.providers["custom"]]
        
        # Existing routing logic...

Complete Examples

Example 1: OpenAI Provider

Here’s a complete implementation for OpenAI’s API:
app/providers/openai.py
import httpx
from app.providers.base import LLMProvider
from app.api.v1.schemas import ChatRequest, ChatResponse, Usage
from app.core.config import settings
import uuid

class OpenAIProvider(LLMProvider):
    @property
    def name(self) -> str:
        return "openai"
    
    async def chat(self, request: ChatRequest) -> ChatResponse:
        """
        Execute chat completion via OpenAI API.
        """
        url = "https://api.openai.com/v1/chat/completions"
        
        # Convert messages to OpenAI format
        openai_messages = [
            {"role": msg.role, "content": msg.content}
            for msg in request.messages
        ]
        
        # Build request payload
        payload = {
            "model": request.model or "gpt-4",
            "messages": openai_messages,
            "temperature": request.temperature,
            "max_tokens": request.max_tokens,
            "stream": False
        }
        
        # Make API request
        headers = {
            "Authorization": f"Bearer {settings.OPENAI_API_KEY}",
            "Content-Type": "application/json"
        }
        
        async with httpx.AsyncClient(timeout=settings.PROVIDER_TIMEOUT_SECONDS) as client:
            try:
                response = await client.post(url, json=payload, headers=headers)
                response.raise_for_status()
                data = response.json()
                
                # Extract response
                choice = data["choices"][0]
                usage = data["usage"]
                
                return ChatResponse(
                    id=data.get("id", str(uuid.uuid4())),
                    provider=self.name,
                    content=choice["message"]["content"],
                    usage=Usage(
                        prompt_tokens=usage["prompt_tokens"],
                        completion_tokens=usage["completion_tokens"],
                        total_tokens=usage["total_tokens"]
                    )
                )
            except httpx.HTTPStatusError as e:
                print(f"[OpenAI HTTP Error] {e.response.status_code}: {e.response.text}")
                raise
            except Exception as e:
                print(f"[OpenAI Error] {e}")
                raise

Example 2: Anthropic Claude Provider

app/providers/anthropic.py
import httpx
from app.providers.base import LLMProvider
from app.api.v1.schemas import ChatRequest, ChatResponse, Usage, Message
from app.core.config import settings
import uuid

class AnthropicProvider(LLMProvider):
    @property
    def name(self) -> str:
        return "anthropic"
    
    async def chat(self, request: ChatRequest) -> ChatResponse:
        """
        Execute chat completion via Anthropic API.
        """
        url = "https://api.anthropic.com/v1/messages"
        
        # Anthropic requires system messages separately
        system_messages = [msg.content for msg in request.messages if msg.role == "system"]
        conversation_messages = [
            {"role": msg.role, "content": msg.content}
            for msg in request.messages
            if msg.role != "system"
        ]
        
        payload = {
            "model": request.model or "claude-3-5-sonnet-20241022",
            "max_tokens": request.max_tokens or 1024,
            "messages": conversation_messages,
        }
        
        # Add system message if present
        if system_messages:
            payload["system"] = " ".join(system_messages)
        
        headers = {
            "x-api-key": settings.ANTHROPIC_API_KEY,
            "anthropic-version": "2023-06-01",
            "content-type": "application/json"
        }
        
        async with httpx.AsyncClient(timeout=settings.PROVIDER_TIMEOUT_SECONDS) as client:
            try:
                response = await client.post(url, json=payload, headers=headers)
                response.raise_for_status()
                data = response.json()
                
                return ChatResponse(
                    id=data.get("id", str(uuid.uuid4())),
                    provider=self.name,
                    content=data["content"][0]["text"],
                    usage=Usage(
                        prompt_tokens=data["usage"]["input_tokens"],
                        completion_tokens=data["usage"]["output_tokens"],
                        total_tokens=data["usage"]["input_tokens"] + data["usage"]["output_tokens"]
                    )
                )
            except Exception as e:
                print(f"[Anthropic Error] {e}")
                raise

Example 3: Mock Provider for Testing

app/providers/mock.py
import asyncio
from app.providers.base import LLMProvider
from app.api.v1.schemas import ChatRequest, ChatResponse, Usage
import uuid

class MockProvider(LLMProvider):
    """
    Mock provider for testing without external API calls.
    """
    
    @property
    def name(self) -> str:
        return "mock"
    
    async def chat(self, request: ChatRequest) -> ChatResponse:
        # Simulate API latency
        await asyncio.sleep(0.1)
        
        # Generate a mock response
        last_message = request.messages[-1].content
        mock_content = f"Mock response to: {last_message[:50]}..."
        
        return ChatResponse(
            id=str(uuid.uuid4()),
            provider=self.name,
            content=mock_content,
            usage=Usage(
                prompt_tokens=len(last_message.split()),
                completion_tokens=len(mock_content.split()),
                total_tokens=len(last_message.split()) + len(mock_content.split())
            )
        )

Message Format Conversion

Different providers expect different message formats. Here are common patterns:
# Standard format used by OpenAI, Ollama, and many others
{
    "role": "user" | "assistant" | "system",
    "content": "message text"
}
No conversion needed from ChatRequest.messages.

Error Handling Best Practices

async def chat(self, request: ChatRequest) -> ChatResponse:
    try:
        # API call
        response = await self.call_api(request)
        return self.parse_response(response)
    except Exception as e:
        print(f"[{self.name.upper()} ERROR] {e}")
        raise

Testing Your Provider

Unit Tests

Create tests for your provider:
tests/test_providers/test_custom.py
import pytest
from app.providers.custom import CustomProvider
from app.api.v1.schemas import ChatRequest, Message

@pytest.mark.asyncio
async def test_custom_provider_chat():
    provider = CustomProvider()
    
    request = ChatRequest(
        messages=[
            Message(role="user", content="Hello, world!")
        ],
        model="default"
    )
    
    response = await provider.chat(request)
    
    assert response.provider == "custom"
    assert response.content is not None
    assert response.usage.total_tokens > 0

@pytest.mark.asyncio
async def test_custom_provider_error_handling():
    provider = CustomProvider()
    
    # Test with invalid request
    request = ChatRequest(messages=[])
    
    with pytest.raises(Exception):
        await provider.chat(request)

Integration Tests

tests/integration/test_custom_provider.py
import pytest
from httpx import AsyncClient
from app.main import app

@pytest.mark.asyncio
async def test_custom_provider_via_api():
    async with AsyncClient(app=app, base_url="http://test") as client:
        response = await client.post(
            "/v1/chat",
            headers={"Authorization": "Bearer sk-gateway-123"},
            json={
                "model": "custom",
                "messages": [
                    {"role": "user", "content": "Test message"}
                ]
            }
        )
        
        assert response.status_code == 200
        data = response.json()
        assert data["provider"] == "custom"

Configuration Checklist

1

Environment Variables

Add all required settings to .env:
.env
CUSTOM_API_KEY=your-api-key
CUSTOM_BASE_URL=https://api.example.com
2

Settings Class

Update app/core/config.py with typed settings:
class Settings(BaseSettings):
    CUSTOM_API_KEY: str = ""
    CUSTOM_BASE_URL: str = "https://api.example.com"
3

Provider Registration

Add to Router in app/core/router.py
4

Routing Logic

Define when your provider should be used
5

Documentation

Document your provider’s configuration and usage
Always validate that required configuration is present before making API calls. Fail fast with clear error messages.

Advanced Features

Streaming Support

If your provider supports streaming:
async def chat_stream(self, request: ChatRequest):
    """
    Stream chat responses token by token.
    """
    # Implementation depends on your provider's streaming API
    async for chunk in self.stream_api_call(request):
        yield chunk

Model Introspection

Provide a method to list available models:
async def list_models(self) -> List[str]:
    """
    Return a list of available models.
    """
    return ["model-v1", "model-v2", "model-v3"]

Custom Parameters

Extend ChatRequest if you need custom parameters:
app/api/v1/schemas.py
class ChatRequest(BaseModel):
    # Standard fields...
    
    # Custom provider-specific parameters
    custom_param: Optional[str] = None
    top_k: Optional[int] = None

Troubleshooting

  • Check that provider is registered in Router.providers
  • Verify routing logic includes your provider
  • Test with explicit model name in request
  • Ensure all dependencies are in requirements.txt
  • Run pip install -r requirements.txt
  • Check Python import paths
  • Verify environment variables are loaded
  • Check API key format and validity
  • Review provider’s authentication documentation
  • Log the raw API response for debugging
  • Validate response structure matches expectations
  • Handle missing or null fields gracefully

Next Steps

Provider Overview

Review provider architecture

Router Configuration

Configure intelligent routing

Testing Guide

Write tests for your provider

Deployment

Deploy your custom provider

Build docs developers (and LLMs) love