Skip to main content
The llm-router crate provides dynamic model selection based on task complexity, multi-provider LLM integration, and usage tracking across 10+ providers.

llm::route

Route to optimal model based on message complexity and tool requirements.
messages
array
Array of message objects (analyzes last message for complexity)
tools
array
Array of available tool definitions
model
string
Optional preferred model name (e.g., “opus”, “sonnet”, “haiku”, “gpt-4o”, “gemini”)
provider
string
Selected provider name (e.g., “anthropic”, “openai”, “google”)
model
string
Selected model identifier
complexity
number
Computed complexity score
use iii_sdk::iii::III;
use serde_json::json;

let iii = III::new("ws://localhost:49134");

let route = iii.trigger("llm::route", json!({
    "messages": [
        {"role": "user", "content": "Analyze this complex system design and compare alternatives"}
    ],
    "tools": [
        {"id": "file::read", "description": "Read file contents"},
        {"id": "memory::recall", "description": "Search memories"}
    ],
    "model": null  // Auto-select based on complexity
})).await?;

println!("Routed to {} / {} (complexity: {})",
    route["provider"].as_str().unwrap(),
    route["model"].as_str().unwrap(),
    route["complexity"].as_u64().unwrap()
);

Complexity Scoring

The router computes complexity based on:
  • Content Length: +1 per 100 characters
  • Code Indicators: +20 if contains ```, function, or class
  • Analysis Keywords: +15 if contains analyze, compare, or design
  • Tool Count: +5 per tool
  • Conversation Length: +10 if > 10 messages

Model Selection by Complexity

Complexity ScoreSelected Model
0-10claude-haiku-4-5-20251001
11-40claude-sonnet-4-20250514
41+claude-opus-4-20250514

Preferred Model Overrides

Specify model parameter to override auto-selection:
  • "opus" or "claude-opus" → claude-opus-4-20250514
  • "sonnet" or "claude-sonnet" → claude-sonnet-4-20250514
  • "haiku" or "claude-haiku" → claude-haiku-4-5-20251001
  • "gpt-4o" → gpt-4o (OpenAI)
  • "gemini" → gemini-2.0-flash (Google)

llm::complete

Send completion request to routed provider with tool support.
provider
string
required
Provider name (from llm::route or manual selection)
model
string
required
Model identifier to use
messages
array
required
Conversation history with role/content pairs
tools
array
Available tools for function calling
max_tokens
number
default:4096
Maximum tokens to generate
content
string
Generated text content
model
string
Model used for completion
toolCalls
array
Array of tool calls requested by model
usage
object
Token usage statistics
input
number
Input tokens consumed
output
number
Output tokens generated
total
number
Total tokens (input + output)
let completion = iii.trigger("llm::complete", json!({
    "provider": "anthropic",
    "model": "claude-sonnet-4-20250514",
    "messages": [
        {"role": "user", "content": "Write a haiku about Rust programming"}
    ],
    "tools": [],
    "max_tokens": 2048
})).await?;

println!("Response: {}", completion["content"].as_str().unwrap());
println!("Tokens: {}", completion["usage"]["total"].as_u64().unwrap());

Usage Tracking

The router automatically tracks token usage per provider:model combination:
  • Increments input_tokens, output_tokens, and requests counters
  • Stored in-memory using DashMap for thread-safe access
  • Query via llm::usage function

Provider Drivers

Supports multiple driver types:
  • Anthropic: Native Anthropic Messages API
  • OpenAiCompat: OpenAI-compatible endpoints (OpenAI, Groq, DeepSeek, Mistral, etc.)
  • Gemini: Google Generative Language API
  • Bedrock: AWS Bedrock (future support)

llm::usage

Get aggregated usage statistics across all providers and models.
stats
array
Array of usage stat objects
provider
string
Provider name
model
string
Model identifier
input_tokens
number
Total input tokens consumed
output_tokens
number
Total output tokens generated
requests
number
Total number of requests
Get Usage Stats
let usage = iii.trigger("llm::usage", json!({})).await?;

for stat in usage["stats"].as_array().unwrap() {
    println!("{}/{}: {} requests, {} total tokens",
        stat["provider"].as_str().unwrap(),
        stat["model"].as_str().unwrap(),
        stat["requests"].as_u64().unwrap(),
        stat["input_tokens"].as_u64().unwrap() + stat["output_tokens"].as_u64().unwrap()
    );
}

llm::providers

List all available providers with configuration status.
providers
array
Array of provider configuration objects
name
string
Provider name
base_url
string
API base URL
env_key
string
Environment variable name for API key
models
array
Array of supported model identifiers
configured
boolean
True if API key is set or provider requires no key (e.g., Ollama)
List Providers
let providers = iii.trigger("llm::providers", json!({})).await?;

for provider in providers["providers"].as_array().unwrap() {
    let name = provider["name"].as_str().unwrap();
    let configured = provider["configured"].as_bool().unwrap();
    let models = provider["models"].as_array().unwrap();
    
    println!("{} {} - {} models",
        if configured { "✓" } else { "✗" },
        name,
        models.len()
    );
}

Supported Providers

The router supports 10 providers out of the box:

Anthropic

Base URL: https://api.anthropic.com
Env Key: ANTHROPIC_API_KEY
Models: claude-opus-4, claude-sonnet-4, claude-haiku-4-5

OpenAI

Base URL: https://api.openai.com/v1
Env Key: OPENAI_API_KEY
Models: gpt-4o, gpt-4o-mini, o1, o3-mini

Google

Base URL: https://generativelanguage.googleapis.com/v1beta
Env Key: GOOGLE_API_KEY
Models: gemini-2.0-flash, gemini-2.0-pro

Groq

Base URL: https://api.groq.com/openai/v1
Env Key: GROQ_API_KEY
Models: llama-3.3-70b-versatile, mixtral-8x7b-32768

DeepSeek

Base URL: https://api.deepseek.com/v1
Env Key: DEEPSEEK_API_KEY
Models: deepseek-chat, deepseek-reasoner

Together

Base URL: https://api.together.xyz/v1
Env Key: TOGETHER_API_KEY
Models: Llama-3.3-70B, Mixtral-8x22B

Mistral

Base URL: https://api.mistral.ai/v1
Env Key: MISTRAL_API_KEY
Models: mistral-large-latest, mistral-small-latest

Fireworks

Base URL: https://api.fireworks.ai/inference/v1
Env Key: FIREWORKS_API_KEY
Models: llama-v3p3-70b-instruct

OpenRouter

Base URL: https://openrouter.ai/api/v1
Env Key: OPENROUTER_API_KEY
Models: anthropic/claude-opus-4, google/gemini-2.0-flash, etc.

Ollama

Base URL: http://localhost:11434/v1
Env Key: (none - local)
Models: llama3.3, qwen2.5, deepseek-r1

Provider Configuration

Environment Variables

Set API keys via environment variables:
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="AIza..."
export GROQ_API_KEY="gsk_..."
export DEEPSEEK_API_KEY="sk-..."
export TOGETHER_API_KEY="..."
export MISTRAL_API_KEY="..."
export FIREWORKS_API_KEY="..."
export OPENROUTER_API_KEY="sk-or-v1-..."

Local Models (Ollama)

Ollama requires no API key and runs locally:
# Install Ollama: https://ollama.com
ollama pull llama3.3
ollama pull deepseek-r1

# Router automatically uses localhost:11434

Driver Implementation

Anthropic Driver

Uses native Anthropic Messages API:
  • Header: x-api-key: {api_key}
  • Header: anthropic-version: 2023-06-01
  • POST to /v1/messages
  • Supports tool use blocks in response

OpenAI-Compatible Driver

Works with OpenAI-compatible endpoints:
  • Header: authorization: Bearer {api_key} (if key provided)
  • POST to /chat/completions
  • Supports tool_calls in response
  • Used by: OpenAI, Groq, DeepSeek, Together, Mistral, Fireworks, OpenRouter, Ollama

Response Normalization

The router normalizes responses from different providers:
  • Content extraction:
    • Anthropic: content[0].text
    • OpenAI: choices[0].message.content
  • Tool calls extraction:
    • Anthropic: filters content blocks with type: "tool_use"
    • OpenAI: choices[0].message.tool_calls
  • Token usage:
    • Anthropic: usage.{input_tokens, output_tokens}
    • OpenAI: usage.{prompt_tokens, completion_tokens}

Example: Complete Workflow

use iii_sdk::iii::III;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let iii = III::new("ws://localhost:49134");

    // Step 1: Route to optimal model
    let route = iii.trigger("llm::route", json!({
        "messages": [{"role": "user", "content": "Explain quantum entanglement"}],
        "tools": []
    })).await?;

    // Step 2: Request completion
    let completion = iii.trigger("llm::complete", json!({
        "provider": route["provider"],
        "model": route["model"],
        "messages": [{"role": "user", "content": "Explain quantum entanglement"}],
        "tools": [],
        "max_tokens": 2048
    })).await?;

    println!("Response: {}", completion["content"].as_str().unwrap());
    println!("Tokens: {}", completion["usage"]["total"].as_u64().unwrap());

    // Step 3: Check usage stats
    let usage = iii.trigger("llm::usage", json!({})).await?;
    println!("\nTotal usage across all providers:");
    for stat in usage["stats"].as_array().unwrap() {
        println!("  {}/{}: {} requests",
            stat["provider"].as_str().unwrap(),
            stat["model"].as_str().unwrap(),
            stat["requests"].as_u64().unwrap()
        );
    }

    Ok(())
}

Build docs developers (and LLMs) love