Skip to main content
MoFA supports multiple LLM providers through a unified LLMProvider trait. This guide shows you how to integrate and configure different providers.

Supported Providers

MoFA natively supports:
  • OpenAI (GPT-4, GPT-4 Turbo, GPT-3.5)
  • Anthropic (Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku)
  • Ollama (Local models: Llama 3, Mistral, etc.)
  • Google Gemini (Gemini Pro, Gemini Flash)
  • Any OpenAI-compatible API (vLLM, LocalAI, OpenRouter)

OpenAI Provider

The most common provider for production use.

Basic Setup

use mofa_sdk::llm::{OpenAIProvider, OpenAIConfig, LLMClient};
use std::sync::Arc;

// Method 1: From environment variables
let provider = OpenAIProvider::from_env();

// Method 2: Direct configuration
let provider = OpenAIProvider::new("sk-...");

// Method 3: Advanced configuration
let config = OpenAIConfig::new("sk-...")
    .with_model("gpt-4o")
    .with_temperature(0.7)
    .with_max_tokens(4096);

let provider = OpenAIProvider::with_config(config);
let client = LLMClient::new(Arc::new(provider));

Environment Variables

.env
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o           # optional, default: gpt-4o
OPENAI_BASE_URL=https://api.openai.com  # optional

Available Models

ModelContext WindowBest For
gpt-4o128K tokensMost capable, vision support
gpt-4o-mini128K tokensFaster, cost-effective
gpt-4-turbo128K tokensHigh quality with vision
gpt-3.5-turbo16K tokensFast and economical

Usage Example

use mofa_sdk::llm::{OpenAIProvider, LLMClient};
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let provider = OpenAIProvider::from_env();
    let client = LLMClient::new(Arc::new(provider));

    // Simple Q&A
    let response = client
        .ask("What is Rust?")
        .await?;

    println!("Answer: {}", response);
    Ok(())
}

Anthropic Provider

Claude models excel at long-form reasoning and analysis.

Basic Setup

use mofa_sdk::llm::{AnthropicProvider, AnthropicConfig, LLMClient};
use std::sync::Arc;

// Method 1: From environment
let provider = AnthropicProvider::from_env();

// Method 2: Direct configuration
let config = AnthropicConfig::new("sk-ant-...")
    .with_model("claude-3.5-sonnet-20241022")
    .with_temperature(0.7)
    .with_max_tokens(4096);

let provider = AnthropicProvider::with_config(config);
let client = LLMClient::new(Arc::new(provider));

Environment Variables

.env
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-3.5-sonnet-20241022  # optional
ANTHROPIC_BASE_URL=https://api.anthropic.com # optional

Available Models

ModelContext WindowBest For
claude-3.5-sonnet-20241022200K tokensBest overall performance
claude-3-opus-20240229200K tokensComplex reasoning tasks
claude-3-haiku-20240307200K tokensFast, cost-effective

Helper Function

use mofa_sdk::anthropic_from_env;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let provider = anthropic_from_env()?;
    let client = LLMClient::new(Arc::new(provider));

    let response = client
        .ask_with_system(
            "You are a helpful assistant.",
            "Explain async/await in Rust"
        )
        .await?;

    println!("{}", response);
    Ok(())
}

Ollama Provider

Run models locally without API costs.

Prerequisites

  1. Install Ollama: https://ollama.ai
  2. Pull a model:
ollama pull llama3.2

Basic Setup

use mofa_sdk::llm::{OllamaProvider, OllamaConfig, LLMClient};
use std::sync::Arc;

// Method 1: Default (localhost:11434)
let provider = OllamaProvider::new();

// Method 2: From environment
let provider = OllamaProvider::from_env();

// Method 3: Custom configuration
let config = OllamaConfig::new()
    .with_base_url("http://localhost:11434/v1")
    .with_model("llama3.2")
    .with_temperature(0.7);

let provider = OllamaProvider::with_config(config);
let client = LLMClient::new(Arc::new(provider));

Environment Variables

.env
OLLAMA_BASE_URL=http://localhost:11434  # optional, default shown
OLLAMA_MODEL=llama3.2                    # optional, default: llama3
ModelSizeBest For
llama3.23B/1BFast local inference
mistral7BGeneral purpose
codellama7B-34BCode generation
qwen2.50.5B-72BMultilingual tasks

Helper Function

use mofa_sdk::llm::ollama_from_env;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let provider = ollama_from_env()?;
    let client = LLMClient::new(Arc::new(provider));

    let response = client.ask("Hello, how are you?").await?;
    println!("{}", response);
    Ok(())
}

Google Gemini Provider

Access Google’s Gemini models.

Basic Setup

use mofa_sdk::llm::{GeminiProvider, GeminiConfig, LLMClient};
use std::sync::Arc;

// Method 1: From environment
let provider = GeminiProvider::from_env();

// Method 2: Direct configuration
let config = GeminiConfig::new("your-api-key")
    .with_model("gemini-1.5-pro-latest")
    .with_temperature(0.7)
    .with_max_tokens(2048);

let provider = GeminiProvider::with_config(config);
let client = LLMClient::new(Arc::new(provider));

Environment Variables

.env
GEMINI_API_KEY=your-key-here
GEMINI_MODEL=gemini-1.5-pro-latest  # optional

Available Models

ModelContext WindowBest For
gemini-1.5-pro-latest1M tokensLong context tasks
gemini-1.5-flash-latest1M tokensFast, cost-effective

Helper Function

use mofa_sdk::gemini_from_env;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let provider = gemini_from_env()?;
    let client = LLMClient::new(Arc::new(provider));

    let response = client.ask("What is quantum computing?").await?;
    println!("{}", response);
    Ok(())
}

OpenAI-Compatible Providers

Use any OpenAI-compatible API (vLLM, LocalAI, OpenRouter).

vLLM Example

use mofa_sdk::llm::{OpenAIProvider, OpenAIConfig};

let config = OpenAIConfig::new("not-needed")
    .with_base_url("http://localhost:8000/v1")
    .with_model("meta-llama/Llama-3.1-8B-Instruct");

let provider = OpenAIProvider::with_config(config);

OpenRouter Example

.env
OPENAI_API_KEY=your-openrouter-key
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_MODEL=google/gemini-2.0-flash-001

Advanced LLM Client Usage

Streaming Responses

use tokio_stream::StreamExt;

let mut stream = client.chat()
    .system("You are a helpful assistant.")
    .user("Tell me a story")
    .send_stream()
    .await?;

while let Some(chunk) = stream.next().await {
    if let Ok(text) = chunk {
        print!("{}", text);
    }
}

Multi-turn Conversation

let response1 = client.chat()
    .user("My favorite language is Rust.")
    .send()
    .await?;

let response2 = client.chat()
    .assistant(response1.content().unwrap())
    .user("What's my favorite language?")
    .send()
    .await?;

println!("{}", response2.content().unwrap());

Tool Calling

use mofa_sdk::llm::function_tool;
use serde_json::json;

let weather_tool = function_tool(
    "get_weather",
    "Get current weather for a location",
    json!({
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City name"
            }
        },
        "required": ["location"]
    })
);

let response = client.chat()
    .system("You are a helpful assistant.")
    .user("What's the weather in Paris?")
    .tool(weather_tool)
    .send()
    .await?;

if let Some(tool_calls) = response.tool_calls() {
    for call in tool_calls {
        println!("Tool: {}", call.function.name);
        println!("Args: {}", call.function.arguments);
    }
}

JSON Mode

let response = client.chat()
    .system("You are a helpful assistant. Always respond in JSON.")
    .user("List 3 programming languages")
    .json_mode()
    .send()
    .await?;

println!("{}", response.content().unwrap());
// Output: {"languages": ["Rust", "Python", "JavaScript"]}

Provider Comparison

FeatureOpenAIAnthropicOllamaGemini
Streaming
Tools⚠️ Limited
Vision
Cost$$$$$$Free$$
PrivacyCloudCloudLocalCloud
Max Context128K200KVaries1M

Best Practices

Never hardcode API keys in your source code. Always use environment variables or a secure secret management system.
// ✅ Good
let provider = OpenAIProvider::from_env();

// ❌ Bad
let provider = OpenAIProvider::new("sk-hardcoded-key");
LLM calls can fail for many reasons (network issues, rate limits, invalid requests). Always handle errors properly.
match client.ask("question").await {
    Ok(response) => println!("{}", response),
    Err(LLMError::RateLimited(msg)) => {
        // Wait and retry
        tokio::time::sleep(Duration::from_secs(60)).await;
    }
    Err(e) => eprintln!("Error: {}", e),
}
  • GPT-4o: Best for complex reasoning, vision tasks
  • GPT-4o-mini: Fast, cost-effective for simple tasks
  • Claude 3.5 Sonnet: Excellent for long-form content, analysis
  • Ollama: Local inference, no API costs, privacy-focused
  • Gemini Flash: Very long context windows (1M tokens)
Track token consumption to optimize costs:
let response = client.chat()
    .user("question")
    .send()
    .await?;

if let Some(usage) = response.usage {
    println!("Tokens: {}", usage.total_tokens);
    println!("Prompt: {}", usage.prompt_tokens);
    println!("Completion: {}", usage.completion_tokens);
}

Next Steps

Agent Lifecycle

Learn about agent state management and lifecycle hooks

Capabilities & State

Master agent capabilities and state patterns

Build docs developers (and LLMs) love