Skip to main content

Overview

RouteLLM Chat is an intelligent AI chat application that automatically routes queries between cost-effective and high-performance models using RouteLLM. Experience cost-optimized conversations with automatic model selection that balances performance and cost. RouteLLM Chat

Features

Intelligent Routing

Automatically selects the most appropriate model for each query

Cost Optimization

Routes simple queries to cheaper models, saving costs

Performance Balance

Complex queries use high-performance models for quality

Transparent

See which model handled each query with color-coded badges

Modern UI

Beautiful Streamlit interface with gradient styling

Chat History

Maintains conversation context across messages

How RouteLLM Works

RouteLLM uses a Model Forwarding (MF) router that intelligently distributes queries:
1

Query Analysis

Analyzes incoming queries to determine complexity and requirements
2

Model Selection

Routes simple queries to the weak model (cost-effective) and complex queries to the strong model (high-performance)
3

Response Generation

Selected model processes the query and generates a response
4

Transparent Feedback

Returns response with model information for transparency

Tech Stack

RouteLLM

Intelligent model routing library

Streamlit

Modern web interface framework

GPT-4o-mini

OpenAI’s strong model for complex tasks

Llama 3.1 70B

Meta’s model via Nebius for cost-effective responses

Prerequisites

Python 3.11+

Python 3.11 or higher required

OpenAI API Key

Nebius API Key

Installation

1

Clone the Repository

git clone https://github.com/Arindam200/awesome-llm-apps.git
cd awesome-llm-apps/simple_ai_agents/llm_router
2

Install Dependencies

Using uv (recommended):
uv sync
Or using pip:
pip install -e .
3

Configure Environment

Create a .env file:
OPENAI_API_KEY=your_openai_api_key_here
NEBIUS_API_KEY=your_nebius_api_key_here

Implementation

RouteLLM Controller Setup

The core of the application uses RouteLLM’s Controller:
import os
from dotenv import load_dotenv
from routellm.controller import Controller

load_dotenv()

# Initialize RouteLLM client
def get_routellm_client():
    nebius_api_key = os.getenv("NEBIUS_API_KEY")
    openai_api_key = os.getenv("OPENAI_API_KEY")
    
    if not nebius_api_key or not openai_api_key:
        return None
    
    # Create controller with model routing
    client = Controller(
        routers=["mf"],  # Model Forwarding router
        strong_model="gpt-4o-mini",
        weak_model="meta-llama/Meta-Llama-3.1-70B-Instruct",
    )
    return client

Streamlit Interface

The application uses Streamlit for the user interface:
import streamlit as st

st.set_page_config(
    page_title="RouteLLM Chat", 
    layout="wide", 
    page_icon="🤖"
)

# Chat input and response handling
if prompt := st.chat_input("Type your message here..."):
    # Add user message to history
    st.session_state.messages.append({
        "role": "user", 
        "content": prompt
    })
    
    # Get RouteLLM client
    client = get_routellm_client()
    
    # Generate response
    response = client.chat.completions.create(
        model="router-mf-0.11593",
        messages=st.session_state.messages
    )
    
    # Display response with model badge
    message_content = response.choices[0].message.content
    model_name = getattr(response, "model", "Unknown")
    
    st.session_state.messages.append({
        "role": "assistant",
        "content": message_content,
        "model": model_name
    })

Usage

1

Run the Application

streamlit run main.py
The app will open at http://localhost:8501
2

Configure API Keys

Enter your OpenAI and Nebius API keys in the sidebar, then click “Save API Keys”
3

Start Chatting

Type your message in the chat input at the bottom
4

View Responses

See responses with model badges indicating which model handled the query:
  • Blue Badge: GPT-4o-mini (strong model)
  • Purple Badge: Nebius Llama (weak model)
5

Clear Chat

Use the “Clear Chat” button to start a new conversation

Example Queries

Try these queries to see RouteLLM in action:
These are likely routed to the Nebius Llama (cost-effective):
What is the capital of France?
Explain photosynthesis in one sentence
List three benefits of exercise

Model Configuration

The application uses two models with different capabilities:

Strong Model: GPT-4o-mini

OpenAI GPT-4o-mini

Purpose: Complex reasoning and nuanced tasksCharacteristics:
  • High accuracy and quality
  • Better at complex reasoning
  • Higher cost per token
  • Used for complex queries
API: OpenAI Platform

Weak Model: Meta Llama 3.1 70B

Meta Llama 3.1 70B Instruct

Purpose: Cost-effective responses for simple queriesCharacteristics:
  • Fast inference
  • Lower cost per token
  • Good for straightforward tasks
  • Accessed via Nebius Token Factory
API: Nebius Token Factory

Router Configuration

RouteLLM supports multiple routing strategies:
RouterDescriptionUse Case
mfModel ForwardingAutomatically routes based on complexity
sw-rankingSliding Window RankingRoutes based on historical performance
causal-llmCausal LLM RouterUses smaller LLM to predict best model
randomRandom RouterFor testing and comparison
This implementation uses the mf (Model Forwarding) router with model ID router-mf-0.11593.

Customization

Change Models

Modify the models in main.py:
client = Controller(
    routers=["mf"],
    strong_model="gpt-4o-mini",  # Change to any OpenAI model
    weak_model="meta-llama/Meta-Llama-3.1-70B-Instruct",  # Change model
)

Adjust Router

Experiment with different routing strategies:
client = Controller(
    routers=["sw-ranking"],  # Try different router
    strong_model="gpt-4o-mini",
    weak_model="meta-llama/Meta-Llama-3.1-70B-Instruct",
)

Custom Styling

Update model badge colors in the Streamlit code:
model_badge_color = (
    "#667eea" if "gpt" in model_name.lower() else "#764ba2"
)

Architecture

1

User Input

User enters a message in the Streamlit chat interface
2

Message Processing

Message is added to chat history and sent to RouteLLM
3

Router Analysis

RouteLLM Controller analyzes query complexity
4

Model Selection

Router selects appropriate model (strong or weak)
5

API Call

Query is sent to selected model’s API endpoint
6

Response Display

Response is displayed with model information badge

Cost Optimization

RouteLLM helps optimize costs by intelligently routing queries:

Simple Queries

Routed to: Weak Model (Nebius Llama)Cost: Lower per tokenExamples: Facts, definitions, simple Q&A

Complex Queries

Routed to: Strong Model (GPT-4o-mini)Cost: Higher per tokenExamples: Analysis, reasoning, creative tasks
Typical cost savings of 30-50% compared to always using the strong model, while maintaining quality for complex queries.

Troubleshooting

Error: “Please configure your API keys”Solution:
  1. Enter API keys in the sidebar
  2. Click “Save API Keys” button
  3. Verify keys are valid and have credits
Possible Causes:
  • Invalid API keys
  • Missing dependencies
  • Network issues
Solution:
  1. Verify both API keys are set
  2. Ensure all packages are installed: uv sync
  3. Check internet connection
Issue: Query routed to unexpected modelNote: This is normal behavior. RouteLLM makes dynamic decisions based on query analysis. Even simple queries may go to the strong model if RouteLLM determines it’s necessary.
Error: Module not foundSolution:
uv sync  # or pip install -e .
Ensure Python 3.11+ is installed.

Best Practices

Monitor Usage

Track API usage and costs across both providers

Test Queries

Test with various query types to understand routing behavior

Review Routing

Monitor which model handles different query types

Adjust Models

Experiment with different model combinations for your use case

Next Steps

RouteLLM Docs

Explore advanced RouteLLM features

Nebius Models

Browse available models on Nebius

OpenAI Platform

Explore OpenAI model options

Custom Routers

Build custom routing logic for your needs

Build docs developers (and LLMs) love