Skip to main content

Overview

Model providers supply the language models that power AI agents. This guide covers configuration for Nebius, OpenAI, and custom providers across different frameworks.

Nebius (Primary Provider)

Nebius Token Factory provides access to multiple open-source models through a unified API.

Agno Framework

from agno.models.nebius import Nebius
import os

model = Nebius(
    id="meta-llama/Llama-3.3-70B-Instruct",
    api_key=os.getenv("NEBIUS_API_KEY")
)
id
string
required
The model identifier from Nebius Token Factory.Examples:
  • "meta-llama/Llama-3.3-70B-Instruct"
  • "Qwen/Qwen3-30B-A3B"
  • "deepseek-ai/DeepSeek-V3-0324"
api_key
string
required
Your Nebius API key. Should be stored in environment variables.Example: os.getenv("NEBIUS_API_KEY")

Available Nebius Models

Llama Models

# Llama 3.3 70B - Balanced performance
Nebius(id="meta-llama/Llama-3.3-70B-Instruct", api_key=key)

# Llama 3.1 405B - Maximum capability
Nebius(id="meta-llama/Meta-Llama-3.1-405B-Instruct", api_key=key)

# Llama 3.1 70B - Efficient and capable
Nebius(id="meta-llama/Meta-Llama-3.1-70B-Instruct", api_key=key)

# Llama 3.1 8B - Fast and lightweight
Nebius(id="meta-llama/Meta-Llama-3.1-8B-Instruct", api_key=key)

Qwen Models

# Qwen 3 235B - Very large context
Nebius(id="Qwen/Qwen3-235B-A22B", api_key=key)

# Qwen 3 32B - Balanced performance
Nebius(id="Qwen/Qwen3-32B", api_key=key)

# Qwen 3 30B - Efficient alternative
Nebius(id="Qwen/Qwen3-30B-A3B", api_key=key)

DeepSeek Models

# DeepSeek V3 - Advanced reasoning
Nebius(id="deepseek-ai/DeepSeek-V3-0324", api_key=key)

# DeepSeek R1 - Latest reasoning model
Nebius(id="deepseek-ai/DeepSeek-R1-0528", api_key=key)

Other Models

# GLM 4.5 Air - Fast and efficient
Nebius(id="zai-org/GLM-4.5-Air", api_key=key)

# GPT OSS 120B - OpenAI-compatible
Nebius(id="openai/gpt-oss-120b", api_key=key)

LangChain with Nebius

For LangChain applications:
from langchain_nebius import ChatNebius
import os

llm = ChatNebius(
    model="zai-org/GLM-4.5-Air",
    temperature=0.1,
    top_p=0.95,
    api_key=os.getenv("NEBIUS_API_KEY")
)
model
string
required
Model identifier.
temperature
float
default:"0.7"
Sampling temperature (0.0-2.0). Lower values make output more deterministic.Example: 0.1 for factual tasks, 0.7 for creative tasks
top_p
float
default:"1.0"
Nucleus sampling parameter (0.0-1.0).Example: 0.95
api_key
string
required
Nebius API key.

TypeScript with Nebius

import { createOpenAICompatible } from '@ai-sdk/openai-compatible';

const nebius = createOpenAICompatible({
  name: 'nebius',
  apiKey: process.env.NEBIUS_API_KEY,
  baseURL: 'https://api.tokenfactory.nebius.com/v1'
});

const model = nebius('meta-llama/Meta-Llama-3.1-405B-Instruct');
name
string
required
Provider identifier.
apiKey
string
required
Your Nebius API key from environment variables.
baseURL
string
required
Nebius API endpoint: https://api.tokenfactory.nebius.com/v1

OpenAI Models

Agno Framework

from agno.models.openai import OpenAIChat
import os

model = OpenAIChat(
    id="gpt-4-turbo-preview",
    api_key=os.getenv("OPENAI_API_KEY")
)

OpenAI-Like Providers

For OpenAI-compatible endpoints:
from agno.models.openai.like import OpenAILike
import os

model = OpenAILike(
    id="custom-model-name",
    api_key=os.getenv("API_KEY"),
    base_url="https://api.custom-provider.com/v1"
)
id
string
required
Model identifier.
api_key
string
required
API key for the provider.
base_url
string
required
Base URL for the API endpoint.

Custom Model Provider (OpenAI Agents SDK)

Implementation

from agents import ModelProvider, Model, OpenAIChatCompletionsModel
from openai import AsyncOpenAI
import os

# Initialize OpenAI client with custom endpoint
client = AsyncOpenAI(
    base_url="https://api.tokenfactory.nebius.com/v1",
    api_key=os.getenv("NEBIUS_API_KEY")
)

class CustomModelProvider(ModelProvider):
    def get_model(self, model_name: str | None) -> Model:
        """
        Returns an OpenAI chat completions model instance.
        
        Args:
            model_name: The name of the model to use, or None for default.
        
        Returns:
            An OpenAIChatCompletionsModel initialized with the model name and client.
        """
        return OpenAIChatCompletionsModel(
            model=model_name,
            openai_client=client
        )

CUSTOM_MODEL_PROVIDER = CustomModelProvider()

Usage with Runner

from agents import Agent, Runner, RunConfig

agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant",
    tools=[send_email]
)

result = await Runner.run(
    agent,
    "Your prompt here",
    run_config=RunConfig(model_provider=CUSTOM_MODEL_PROVIDER)
)

print(result.final_output)

Direct API Calls (TypeScript)

For direct API integration without agent frameworks:
interface NebiusAIService {
  baseURL: string;
  model: string;
  apiKey: string;
}

class NebiusAIService {
  private baseURL: string;
  private model: string;
  private getApiKey: () => string;

  constructor(apiKey?: string, baseURL?: string, model?: string) {
    const resolvedApiKey = apiKey || process.env.NEBIUS_API_KEY || '';
    this.baseURL = baseURL || process.env.NEBIUS_BASE_URL || 'https://api.tokenfactory.nebius.com/v1/';
    this.model = model || process.env.NEBIUS_MODEL || 'Qwen/Qwen3-235B-A22B';
    
    if (!resolvedApiKey) {
      throw new Error('Nebius API key not found');
    }
    
    this.getApiKey = () => resolvedApiKey;
  }

  async callAPI(
    prompt: string,
    systemContent: string,
    options: { maxTokens?: number; temperature?: number } = {}
  ): Promise<any> {
    const requestPayload = {
      model: this.model,
      messages: [
        {
          role: 'system',
          content: systemContent
        },
        {
          role: 'user',
          content: [{ type: 'text', text: prompt }]
        }
      ],
      max_tokens: options.maxTokens || 1000,
      temperature: options.temperature || 0.7
    };

    const response = await fetch(`${this.baseURL}chat/completions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.getApiKey()}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(requestPayload)
    });

    if (!response.ok) {
      const errorText = await response.text();
      throw new Error(`Nebius API error: ${response.status} - ${errorText}`);
    }

    return await response.json();
  }
}

Usage Example

const service = new NebiusAIService(
  process.env.NEBIUS_API_KEY,
  'https://api.tokenfactory.nebius.com/v1/',
  'Qwen/Qwen3-235B-A22B'
);

const result = await service.callAPI(
  'Analyze the financial data',
  'You are a professional financial analyst',
  { maxTokens: 1000, temperature: 0.7 }
);

const analysisText = result.choices?.[0]?.message?.content;

Model Selection Guidelines

By Task Type

Simple Tasks (Q&A, summarization)

Nebius(id="Qwen/Qwen3-30B-A3B", api_key=key)
Nebius(id="meta-llama/Meta-Llama-3.1-8B-Instruct", api_key=key)

Complex Reasoning (analysis, problem-solving)

Nebius(id="deepseek-ai/DeepSeek-V3-0324", api_key=key)
Nebius(id="meta-llama/Llama-3.3-70B-Instruct", api_key=key)

Code Generation

Nebius(id="deepseek-ai/DeepSeek-V3-0324", api_key=key)
Nebius(id="Qwen/Qwen3-32B", api_key=key)

Large Context (long documents)

Nebius(id="meta-llama/Meta-Llama-3.1-405B-Instruct", api_key=key)
Nebius(id="Qwen/Qwen3-235B-A22B", api_key=key)

Fast Responses (real-time applications)

Nebius(id="zai-org/GLM-4.5-Air", api_key=key)
Nebius(id="meta-llama/Meta-Llama-3.1-8B-Instruct", api_key=key)

By Cost-Performance

Cost-Effective

  • Qwen/Qwen3-30B-A3B
  • meta-llama/Meta-Llama-3.1-8B-Instruct
  • zai-org/GLM-4.5-Air

Balanced

  • meta-llama/Llama-3.3-70B-Instruct
  • Qwen/Qwen3-32B
  • meta-llama/Meta-Llama-3.1-70B-Instruct

Maximum Capability

  • meta-llama/Meta-Llama-3.1-405B-Instruct
  • Qwen/Qwen3-235B-A22B
  • deepseek-ai/DeepSeek-V3-0324

Configuration Best Practices

1. Use Environment Variables

import os
from dotenv import load_dotenv

load_dotenv()

model = Nebius(
    id="meta-llama/Llama-3.3-70B-Instruct",
    api_key=os.getenv("NEBIUS_API_KEY")  # Never hardcode
)

2. Configure Temperature Appropriately

# For factual tasks - low temperature
llm = ChatNebius(
    model="Qwen/Qwen3-32B",
    temperature=0.1,  # More deterministic
    api_key=os.getenv("NEBIUS_API_KEY")
)

# For creative tasks - higher temperature
llm = ChatNebius(
    model="meta-llama/Llama-3.3-70B-Instruct",
    temperature=0.7,  # More creative
    api_key=os.getenv("NEBIUS_API_KEY")
)

3. Handle API Errors

try:
    model = Nebius(
        id="meta-llama/Llama-3.3-70B-Instruct",
        api_key=os.getenv("NEBIUS_API_KEY")
    )
    response = agent.run("Your query")
except Exception as e:
    print(f"Model error: {e}")

4. Test with Multiple Models

MODELS = [
    "Qwen/Qwen3-30B-A3B",
    "meta-llama/Llama-3.3-70B-Instruct",
    "deepseek-ai/DeepSeek-V3-0324"
]

for model_id in MODELS:
    model = Nebius(id=model_id, api_key=os.getenv("NEBIUS_API_KEY"))
    # Test with your use case

Environment Setup

# .env file
NEBIUS_API_KEY=your_nebius_api_key
NEBIUS_BASE_URL=https://api.tokenfactory.nebius.com/v1/
NEBIUS_MODEL=meta-llama/Llama-3.3-70B-Instruct

# Optional
OPENAI_API_KEY=your_openai_api_key

Build docs developers (and LLMs) love