Overview
Model providers supply the language models that power AI agents. This guide covers configuration for Nebius, OpenAI, and custom providers across different frameworks.
Nebius (Primary Provider)
Nebius Token Factory provides access to multiple open-source models through a unified API.
Agno Framework
from agno.models.nebius import Nebius
import os
model = Nebius(
id="meta-llama/Llama-3.3-70B-Instruct",
api_key=os.getenv("NEBIUS_API_KEY")
)
The model identifier from Nebius Token Factory.Examples:
"meta-llama/Llama-3.3-70B-Instruct"
"Qwen/Qwen3-30B-A3B"
"deepseek-ai/DeepSeek-V3-0324"
Your Nebius API key. Should be stored in environment variables.Example: os.getenv("NEBIUS_API_KEY")
Available Nebius Models
Llama Models
# Llama 3.3 70B - Balanced performance
Nebius(id="meta-llama/Llama-3.3-70B-Instruct", api_key=key)
# Llama 3.1 405B - Maximum capability
Nebius(id="meta-llama/Meta-Llama-3.1-405B-Instruct", api_key=key)
# Llama 3.1 70B - Efficient and capable
Nebius(id="meta-llama/Meta-Llama-3.1-70B-Instruct", api_key=key)
# Llama 3.1 8B - Fast and lightweight
Nebius(id="meta-llama/Meta-Llama-3.1-8B-Instruct", api_key=key)
Qwen Models
# Qwen 3 235B - Very large context
Nebius(id="Qwen/Qwen3-235B-A22B", api_key=key)
# Qwen 3 32B - Balanced performance
Nebius(id="Qwen/Qwen3-32B", api_key=key)
# Qwen 3 30B - Efficient alternative
Nebius(id="Qwen/Qwen3-30B-A3B", api_key=key)
DeepSeek Models
# DeepSeek V3 - Advanced reasoning
Nebius(id="deepseek-ai/DeepSeek-V3-0324", api_key=key)
# DeepSeek R1 - Latest reasoning model
Nebius(id="deepseek-ai/DeepSeek-R1-0528", api_key=key)
Other Models
# GLM 4.5 Air - Fast and efficient
Nebius(id="zai-org/GLM-4.5-Air", api_key=key)
# GPT OSS 120B - OpenAI-compatible
Nebius(id="openai/gpt-oss-120b", api_key=key)
LangChain with Nebius
For LangChain applications:
from langchain_nebius import ChatNebius
import os
llm = ChatNebius(
model="zai-org/GLM-4.5-Air",
temperature=0.1,
top_p=0.95,
api_key=os.getenv("NEBIUS_API_KEY")
)
Sampling temperature (0.0-2.0). Lower values make output more deterministic.Example: 0.1 for factual tasks, 0.7 for creative tasks
Nucleus sampling parameter (0.0-1.0).Example: 0.95
TypeScript with Nebius
import { createOpenAICompatible } from '@ai-sdk/openai-compatible';
const nebius = createOpenAICompatible({
name: 'nebius',
apiKey: process.env.NEBIUS_API_KEY,
baseURL: 'https://api.tokenfactory.nebius.com/v1'
});
const model = nebius('meta-llama/Meta-Llama-3.1-405B-Instruct');
Your Nebius API key from environment variables.
Nebius API endpoint: https://api.tokenfactory.nebius.com/v1
OpenAI Models
Agno Framework
from agno.models.openai import OpenAIChat
import os
model = OpenAIChat(
id="gpt-4-turbo-preview",
api_key=os.getenv("OPENAI_API_KEY")
)
OpenAI-Like Providers
For OpenAI-compatible endpoints:
from agno.models.openai.like import OpenAILike
import os
model = OpenAILike(
id="custom-model-name",
api_key=os.getenv("API_KEY"),
base_url="https://api.custom-provider.com/v1"
)
API key for the provider.
Base URL for the API endpoint.
Custom Model Provider (OpenAI Agents SDK)
Implementation
from agents import ModelProvider, Model, OpenAIChatCompletionsModel
from openai import AsyncOpenAI
import os
# Initialize OpenAI client with custom endpoint
client = AsyncOpenAI(
base_url="https://api.tokenfactory.nebius.com/v1",
api_key=os.getenv("NEBIUS_API_KEY")
)
class CustomModelProvider(ModelProvider):
def get_model(self, model_name: str | None) -> Model:
"""
Returns an OpenAI chat completions model instance.
Args:
model_name: The name of the model to use, or None for default.
Returns:
An OpenAIChatCompletionsModel initialized with the model name and client.
"""
return OpenAIChatCompletionsModel(
model=model_name,
openai_client=client
)
CUSTOM_MODEL_PROVIDER = CustomModelProvider()
Usage with Runner
from agents import Agent, Runner, RunConfig
agent = Agent(
name="Assistant",
instructions="You are a helpful assistant",
tools=[send_email]
)
result = await Runner.run(
agent,
"Your prompt here",
run_config=RunConfig(model_provider=CUSTOM_MODEL_PROVIDER)
)
print(result.final_output)
Direct API Calls (TypeScript)
For direct API integration without agent frameworks:
interface NebiusAIService {
baseURL: string;
model: string;
apiKey: string;
}
class NebiusAIService {
private baseURL: string;
private model: string;
private getApiKey: () => string;
constructor(apiKey?: string, baseURL?: string, model?: string) {
const resolvedApiKey = apiKey || process.env.NEBIUS_API_KEY || '';
this.baseURL = baseURL || process.env.NEBIUS_BASE_URL || 'https://api.tokenfactory.nebius.com/v1/';
this.model = model || process.env.NEBIUS_MODEL || 'Qwen/Qwen3-235B-A22B';
if (!resolvedApiKey) {
throw new Error('Nebius API key not found');
}
this.getApiKey = () => resolvedApiKey;
}
async callAPI(
prompt: string,
systemContent: string,
options: { maxTokens?: number; temperature?: number } = {}
): Promise<any> {
const requestPayload = {
model: this.model,
messages: [
{
role: 'system',
content: systemContent
},
{
role: 'user',
content: [{ type: 'text', text: prompt }]
}
],
max_tokens: options.maxTokens || 1000,
temperature: options.temperature || 0.7
};
const response = await fetch(`${this.baseURL}chat/completions`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.getApiKey()}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(requestPayload)
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(`Nebius API error: ${response.status} - ${errorText}`);
}
return await response.json();
}
}
Usage Example
const service = new NebiusAIService(
process.env.NEBIUS_API_KEY,
'https://api.tokenfactory.nebius.com/v1/',
'Qwen/Qwen3-235B-A22B'
);
const result = await service.callAPI(
'Analyze the financial data',
'You are a professional financial analyst',
{ maxTokens: 1000, temperature: 0.7 }
);
const analysisText = result.choices?.[0]?.message?.content;
Model Selection Guidelines
By Task Type
Simple Tasks (Q&A, summarization)
Nebius(id="Qwen/Qwen3-30B-A3B", api_key=key)
Nebius(id="meta-llama/Meta-Llama-3.1-8B-Instruct", api_key=key)
Complex Reasoning (analysis, problem-solving)
Nebius(id="deepseek-ai/DeepSeek-V3-0324", api_key=key)
Nebius(id="meta-llama/Llama-3.3-70B-Instruct", api_key=key)
Code Generation
Nebius(id="deepseek-ai/DeepSeek-V3-0324", api_key=key)
Nebius(id="Qwen/Qwen3-32B", api_key=key)
Large Context (long documents)
Nebius(id="meta-llama/Meta-Llama-3.1-405B-Instruct", api_key=key)
Nebius(id="Qwen/Qwen3-235B-A22B", api_key=key)
Fast Responses (real-time applications)
Nebius(id="zai-org/GLM-4.5-Air", api_key=key)
Nebius(id="meta-llama/Meta-Llama-3.1-8B-Instruct", api_key=key)
Cost-Effective
Qwen/Qwen3-30B-A3B
meta-llama/Meta-Llama-3.1-8B-Instruct
zai-org/GLM-4.5-Air
Balanced
meta-llama/Llama-3.3-70B-Instruct
Qwen/Qwen3-32B
meta-llama/Meta-Llama-3.1-70B-Instruct
Maximum Capability
meta-llama/Meta-Llama-3.1-405B-Instruct
Qwen/Qwen3-235B-A22B
deepseek-ai/DeepSeek-V3-0324
Configuration Best Practices
1. Use Environment Variables
import os
from dotenv import load_dotenv
load_dotenv()
model = Nebius(
id="meta-llama/Llama-3.3-70B-Instruct",
api_key=os.getenv("NEBIUS_API_KEY") # Never hardcode
)
# For factual tasks - low temperature
llm = ChatNebius(
model="Qwen/Qwen3-32B",
temperature=0.1, # More deterministic
api_key=os.getenv("NEBIUS_API_KEY")
)
# For creative tasks - higher temperature
llm = ChatNebius(
model="meta-llama/Llama-3.3-70B-Instruct",
temperature=0.7, # More creative
api_key=os.getenv("NEBIUS_API_KEY")
)
3. Handle API Errors
try:
model = Nebius(
id="meta-llama/Llama-3.3-70B-Instruct",
api_key=os.getenv("NEBIUS_API_KEY")
)
response = agent.run("Your query")
except Exception as e:
print(f"Model error: {e}")
4. Test with Multiple Models
MODELS = [
"Qwen/Qwen3-30B-A3B",
"meta-llama/Llama-3.3-70B-Instruct",
"deepseek-ai/DeepSeek-V3-0324"
]
for model_id in MODELS:
model = Nebius(id=model_id, api_key=os.getenv("NEBIUS_API_KEY"))
# Test with your use case
Environment Setup
# .env file
NEBIUS_API_KEY=your_nebius_api_key
NEBIUS_BASE_URL=https://api.tokenfactory.nebius.com/v1/
NEBIUS_MODEL=meta-llama/Llama-3.3-70B-Instruct
# Optional
OPENAI_API_KEY=your_openai_api_key