RouteLLM Chat - Awesome AI Apps

Overview

RouteLLM Chat is an intelligent AI chat application that automatically routes queries between cost-effective and high-performance models using RouteLLM. Experience cost-optimized conversations with automatic model selection that balances performance and cost.

Features

Intelligent Routing

Automatically selects the most appropriate model for each query

Cost Optimization

Routes simple queries to cheaper models, saving costs

Performance Balance

Complex queries use high-performance models for quality

Transparent

See which model handled each query with color-coded badges

Modern UI

Beautiful Streamlit interface with gradient styling

Chat History

Maintains conversation context across messages

How RouteLLM Works

RouteLLM uses a Model Forwarding (MF) router that intelligently distributes queries:

Query Analysis

Analyzes incoming queries to determine complexity and requirements

Model Selection

Routes simple queries to the weak model (cost-effective) and complex queries to the strong model (high-performance)

Response Generation

Selected model processes the query and generates a response

Transparent Feedback

Returns response with model information for transparency

Tech Stack

RouteLLM

Intelligent model routing library

Streamlit

Modern web interface framework

GPT-4o-mini

OpenAI’s strong model for complex tasks

Llama 3.1 70B

Meta’s model via Nebius for cost-effective responses

Prerequisites

Python 3.11+

Python 3.11 or higher required

OpenAI API Key

Get your key

Nebius API Key

Get your key

Installation

Clone the Repository

git clone https://github.com/Arindam200/awesome-llm-apps.git
cd awesome-llm-apps/simple_ai_agents/llm_router

Install Dependencies

Using uv (recommended):

uv sync

Or using pip:

pip install -e .

Configure Environment

Create a .env file:

OPENAI_API_KEY=your_openai_api_key_here
NEBIUS_API_KEY=your_nebius_api_key_here

Implementation

RouteLLM Controller Setup

The core of the application uses RouteLLM’s Controller:

import os
from dotenv import load_dotenv
from routellm.controller import Controller

load_dotenv()

# Initialize RouteLLM client
def get_routellm_client():
    nebius_api_key = os.getenv("NEBIUS_API_KEY")
    openai_api_key = os.getenv("OPENAI_API_KEY")
    
    if not nebius_api_key or not openai_api_key:
        return None
    
    # Create controller with model routing
    client = Controller(
        routers=["mf"],  # Model Forwarding router
        strong_model="gpt-4o-mini",
        weak_model="meta-llama/Meta-Llama-3.1-70B-Instruct",
    )
    return client

Streamlit Interface

The application uses Streamlit for the user interface:

import streamlit as st

st.set_page_config(
    page_title="RouteLLM Chat", 
    layout="wide", 
    page_icon="🤖"
)

# Chat input and response handling
if prompt := st.chat_input("Type your message here..."):
    # Add user message to history
    st.session_state.messages.append({
        "role": "user", 
        "content": prompt
    })
    
    # Get RouteLLM client
    client = get_routellm_client()
    
    # Generate response
    response = client.chat.completions.create(
        model="router-mf-0.11593",
        messages=st.session_state.messages
    )
    
    # Display response with model badge
    message_content = response.choices[0].message.content
    model_name = getattr(response, "model", "Unknown")
    
    st.session_state.messages.append({
        "role": "assistant",
        "content": message_content,
        "model": model_name
    })

Usage

Run the Application

streamlit run main.py

The app will open at http://localhost:8501

Configure API Keys

Enter your OpenAI and Nebius API keys in the sidebar, then click “Save API Keys”

Start Chatting

Type your message in the chat input at the bottom

View Responses

See responses with model badges indicating which model handled the query:

Blue Badge: GPT-4o-mini (strong model)
Purple Badge: Nebius Llama (weak model)

Clear Chat

Use the “Clear Chat” button to start a new conversation

Example Queries

Try these queries to see RouteLLM in action:

Simple Queries
Complex Queries

These are likely routed to the Nebius Llama (cost-effective):

What is the capital of France?

Explain photosynthesis in one sentence

List three benefits of exercise

These are likely routed to GPT-4o-mini (high-performance):

Write a detailed analysis comparing Python and JavaScript 
for web development, including pros and cons of each

Explain quantum computing concepts and their practical 
applications in today's technology landscape

Create a comprehensive marketing strategy for a tech 
startup targeting enterprise customers

Model Configuration

The application uses two models with different capabilities:

Strong Model: GPT-4o-mini

OpenAI GPT-4o-mini

Purpose: Complex reasoning and nuanced tasksCharacteristics:

High accuracy and quality
Better at complex reasoning
Higher cost per token
Used for complex queries

API: OpenAI Platform

Weak Model: Meta Llama 3.1 70B

Meta Llama 3.1 70B Instruct

Purpose: Cost-effective responses for simple queriesCharacteristics:

Fast inference
Lower cost per token
Good for straightforward tasks
Accessed via Nebius Token Factory

API: Nebius Token Factory

Router Configuration

RouteLLM supports multiple routing strategies:

Router	Description	Use Case
`mf`	Model Forwarding	Automatically routes based on complexity
`sw-ranking`	Sliding Window Ranking	Routes based on historical performance
`causal-llm`	Causal LLM Router	Uses smaller LLM to predict best model
`random`	Random Router	For testing and comparison

This implementation uses the mf (Model Forwarding) router with model ID router-mf-0.11593.

Customization

Change Models

Modify the models in main.py:

client = Controller(
    routers=["mf"],
    strong_model="gpt-4o-mini",  # Change to any OpenAI model
    weak_model="meta-llama/Meta-Llama-3.1-70B-Instruct",  # Change model
)

Adjust Router

Experiment with different routing strategies:

client = Controller(
    routers=["sw-ranking"],  # Try different router
    strong_model="gpt-4o-mini",
    weak_model="meta-llama/Meta-Llama-3.1-70B-Instruct",
)

Custom Styling

Update model badge colors in the Streamlit code:

model_badge_color = (
    "#667eea" if "gpt" in model_name.lower() else "#764ba2"
)

Architecture

User Input

User enters a message in the Streamlit chat interface

Message Processing

Message is added to chat history and sent to RouteLLM

Router Analysis

RouteLLM Controller analyzes query complexity

Model Selection

Router selects appropriate model (strong or weak)

API Call

Query is sent to selected model’s API endpoint

Response Display

Response is displayed with model information badge

Cost Optimization

RouteLLM helps optimize costs by intelligently routing queries:

Simple Queries

Routed to: Weak Model (Nebius Llama)Cost: Lower per tokenExamples: Facts, definitions, simple Q&A

Complex Queries

Routed to: Strong Model (GPT-4o-mini)Cost: Higher per tokenExamples: Analysis, reasoning, creative tasks

Typical cost savings of 30-50% compared to always using the strong model, while maintaining quality for complex queries.

Troubleshooting

API Key Errors

Error: “Please configure your API keys”Solution:

Enter API keys in the sidebar
Click “Save API Keys” button
Verify keys are valid and have credits

RouteLLM Initialization Failed

Possible Causes:

Invalid API keys
Missing dependencies
Network issues

Solution:

Verify both API keys are set
Ensure all packages are installed: uv sync
Check internet connection

Unexpected Routing

Issue: Query routed to unexpected modelNote: This is normal behavior. RouteLLM makes dynamic decisions based on query analysis. Even simple queries may go to the strong model if RouteLLM determines it’s necessary.

Import Errors

Error: Module not foundSolution:

uv sync  # or pip install -e .

Ensure Python 3.11+ is installed.

Best Practices

Monitor Usage

Track API usage and costs across both providers

Test Queries

Test with various query types to understand routing behavior

Review Routing

Monitor which model handles different query types

Adjust Models

Experiment with different model combinations for your use case

Next Steps

RouteLLM Docs

Explore advanced RouteLLM features

Nebius Models

Browse available models on Nebius

OpenAI Platform

Explore OpenAI model options

Custom Routers

Build custom routing logic for your needs

Starter Agents

Simple Agents

MCP Agents

Memory Agents

RAG Applications

Advanced Agents

​Overview

​Features

Intelligent Routing

Cost Optimization

Performance Balance

Transparent

Modern UI

Chat History

​How RouteLLM Works

​Tech Stack

RouteLLM

Streamlit

GPT-4o-mini

Llama 3.1 70B

​Prerequisites

Python 3.11+

OpenAI API Key

Nebius API Key

​Installation

​Implementation

​RouteLLM Controller Setup

​Streamlit Interface

​Usage

​Example Queries

​Model Configuration

​Strong Model: GPT-4o-mini

OpenAI GPT-4o-mini

​Weak Model: Meta Llama 3.1 70B

Meta Llama 3.1 70B Instruct

​Router Configuration

​Customization

​Change Models

​Adjust Router

​Custom Styling

​Architecture

​Cost Optimization

Simple Queries

Complex Queries

​Troubleshooting

​Best Practices

Monitor Usage

Test Queries

Review Routing

Adjust Models

​Next Steps

RouteLLM Docs

Nebius Models

OpenAI Platform

Custom Routers

Build docs developers (and LLMs) love

Overview

Features

How RouteLLM Works

Tech Stack

Prerequisites

Installation

Implementation

RouteLLM Controller Setup

Streamlit Interface

Usage

Example Queries

Model Configuration

Strong Model: GPT-4o-mini

Weak Model: Meta Llama 3.1 70B

Router Configuration

Customization

Change Models

Adjust Router

Custom Styling

Architecture

Cost Optimization

Troubleshooting

Best Practices

Next Steps