Routing

Overview

The Router component determines which LLM provider should handle each incoming request. Routing decisions are based on the model or model_hint field in the request, allowing clients to control which provider is used.

Router Implementation

The Router class in app/core/router.py manages provider selection:

from typing import List 
from app.providers.base import LLMProvider
from app.providers.gemini import GeminiProvider
from app.providers.ollama import OllamaProvider

class Router:
    """
    Decides which providers should handle a request.
    """
    def __init__(self):
        self.providers = {
            "gemini": GeminiProvider(),
            "ollama": OllamaProvider(),
        }
    
    def route(self, request) -> List[LLMProvider]:
        """
        Routes the request to the appropriate provider.
        """
        # Prioritize explicit model selection
        target = request.model or request.model_hint
        
        if target == "online" or target == "gemini" or target == "fast":
            return [self.providers["gemini"]]
        elif target == "ollama" or target == "local" or target == "secure":
            return [self.providers["ollama"]]
            
        # Default fallback
        return [self.providers["ollama"]]

Model Hints

The router supports semantic hints that map to provider characteristics:

Gemini Provider Hints

These hints route requests to Google’s Gemini (cloud-based):

{
  "model_hint": "online",
  "messages": [...]
}

online - Routes to cloud-based provider
gemini - Explicitly selects Gemini
fast - Optimizes for speed (cloud providers are typically faster)

Ollama Provider Hints

These hints route requests to Ollama (local/self-hosted):

{
  "model_hint": "local",
  "messages": [...]
}

local - Routes to self-hosted provider
ollama - Explicitly selects Ollama
secure - Prioritizes data privacy (local processing)

Routing Priority

The router follows this priority order:

Explicit model field - If present, takes precedence over model_hint
Model hint - Semantic hint for provider selection
Default fallback - Routes to Ollama if no hint is provided

target = request.model or request.model_hint

The model field takes precedence over model_hint, allowing clients to override hints with explicit provider selection.

Routing Decision Flow

Provider List Return

The route() method returns a list of providers:

def route(self, request) -> List[LLMProvider]:

This design allows for future enhancements:

Fallback chains - Try multiple providers in sequence
Load balancing - Distribute requests across multiple instances
A/B testing - Route requests to different providers for comparison

Currently, the router returns a single-item list, but the ChatService iterates through all providers with retry logic, making it easy to add fallback providers in the future.

Integration with ChatService

The ChatService uses the router to get providers and iterates through them with retry logic:

providers = self.router.route(request)
last_exception = None

for provider in providers:
    for attempt in range(settings.PROVIDER_MAX_RETRIES):
        try:
            response = await self._call_provider(provider, request)
            self.cache.set(cache_key, response)
            return response
        except Exception as e:
            last_exception = e
            continue
raise last_exception if last_exception else Exception("No providers available")

Source: app/core/service.py:55-67

Usage Examples

Request with Gemini Provider

import httpx

response = await httpx.post(
    "http://localhost:8000/chat",
    headers={"X-API-Key": "your-api-key"},
    json={
        "model_hint": "fast",
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "max_tokens": 100
    }
)

Request with Ollama Provider

import httpx

response = await httpx.post(
    "http://localhost:8000/chat",
    headers={"X-API-Key": "your-api-key"},
    json={
        "model_hint": "secure",
        "messages": [
            {"role": "user", "content": "Analyze this sensitive document..."}
        ],
        "max_tokens": 500
    }
)

Default Routing (No Hint)

import httpx

response = await httpx.post(
    "http://localhost:8000/chat",
    headers={"X-API-Key": "your-api-key"},
    json={
        "messages": [
            {"role": "user", "content": "Hello!"}
        ]
    }
)
# Routes to Ollama (default)

Provider Interface

All providers implement the LLMProvider base class interface, ensuring consistent behavior:

from app.providers.base import LLMProvider

class CustomProvider(LLMProvider):
    async def chat(self, request: ChatRequest) -> ChatResponse:
        # Implementation
        pass

Adding New Providers

To add a new provider:

Create a new provider class implementing LLMProvider
Register it in the Router.__init__() providers dictionary
Add routing logic in Router.route() with appropriate hints

class Router:
    def __init__(self):
        self.providers = {
            "gemini": GeminiProvider(),
            "ollama": OllamaProvider(),
            "custom": CustomProvider(),  # New provider
        }
    
    def route(self, request) -> List[LLMProvider]:
        target = request.model or request.model_hint
        
        if target == "custom" or target == "specialized":
            return [self.providers["custom"]]
        # ... existing logic

Best Practices

Use semantic hints for flexibility

Use hints like fast, secure, local instead of explicit provider names. This allows you to change the underlying provider without updating client code.

Set appropriate defaults

The default fallback routing ensures requests never fail due to missing hints. Choose a default that matches your primary use case.

Document provider characteristics

Make sure clients understand what each hint means (speed, privacy, cost) so they can make informed routing decisions.

Next Steps

Architecture

Understand the full system architecture

Caching

Learn how responses are cached

Rate Limiting

Explore rate limiting implementation

API Reference

Complete API documentation

Get Started

Core Concepts

Providers

Observability

Deployment

Overview

Router Implementation

Model Hints

Gemini Provider Hints

Ollama Provider Hints

Routing Priority

Routing Decision Flow

Provider List Return

Integration with ChatService

Usage Examples

Request with Gemini Provider

Request with Ollama Provider

Default Routing (No Hint)

Provider Interface

Adding New Providers

Best Practices

Next Steps

Architecture

Caching

Rate Limiting

API Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Providers

Observability

Deployment

​Overview

​Router Implementation

​Model Hints

​Gemini Provider Hints

​Ollama Provider Hints

​Routing Priority

​Routing Decision Flow

​Provider List Return

​Integration with ChatService

​Usage Examples

​Request with Gemini Provider

​Request with Ollama Provider

​Default Routing (No Hint)

​Provider Interface

​Adding New Providers

​Best Practices

​Next Steps

Architecture

Caching

Rate Limiting

API Reference

Build docs developers (and LLMs) love

Overview

Router Implementation

Model Hints

Gemini Provider Hints

Ollama Provider Hints

Routing Priority

Routing Decision Flow

Provider List Return

Integration with ChatService

Usage Examples

Request with Gemini Provider

Request with Ollama Provider

Default Routing (No Hint)

Provider Interface

Adding New Providers

Best Practices

Next Steps