Model Providers

Overview

Model providers supply the language models that power AI agents. This guide covers configuration for Nebius, OpenAI, and custom providers across different frameworks.

Nebius (Primary Provider)

Nebius Token Factory provides access to multiple open-source models through a unified API.

Agno Framework

from agno.models.nebius import Nebius
import os

model = Nebius(
    id="meta-llama/Llama-3.3-70B-Instruct",
    api_key=os.getenv("NEBIUS_API_KEY")
)

string

required

The model identifier from Nebius Token Factory.Examples:

"meta-llama/Llama-3.3-70B-Instruct"
"Qwen/Qwen3-30B-A3B"
"deepseek-ai/DeepSeek-V3-0324"

api_key

string

required

Your Nebius API key. Should be stored in environment variables.Example: os.getenv("NEBIUS_API_KEY")

Available Nebius Models

Llama Models

# Llama 3.3 70B - Balanced performance
Nebius(id="meta-llama/Llama-3.3-70B-Instruct", api_key=key)

# Llama 3.1 405B - Maximum capability
Nebius(id="meta-llama/Meta-Llama-3.1-405B-Instruct", api_key=key)

# Llama 3.1 70B - Efficient and capable
Nebius(id="meta-llama/Meta-Llama-3.1-70B-Instruct", api_key=key)

# Llama 3.1 8B - Fast and lightweight
Nebius(id="meta-llama/Meta-Llama-3.1-8B-Instruct", api_key=key)

Qwen Models

# Qwen 3 235B - Very large context
Nebius(id="Qwen/Qwen3-235B-A22B", api_key=key)

# Qwen 3 32B - Balanced performance
Nebius(id="Qwen/Qwen3-32B", api_key=key)

# Qwen 3 30B - Efficient alternative
Nebius(id="Qwen/Qwen3-30B-A3B", api_key=key)

DeepSeek Models

# DeepSeek V3 - Advanced reasoning
Nebius(id="deepseek-ai/DeepSeek-V3-0324", api_key=key)

# DeepSeek R1 - Latest reasoning model
Nebius(id="deepseek-ai/DeepSeek-R1-0528", api_key=key)

Other Models

# GLM 4.5 Air - Fast and efficient
Nebius(id="zai-org/GLM-4.5-Air", api_key=key)

# GPT OSS 120B - OpenAI-compatible
Nebius(id="openai/gpt-oss-120b", api_key=key)

LangChain with Nebius

For LangChain applications:

from langchain_nebius import ChatNebius
import os

llm = ChatNebius(
    model="zai-org/GLM-4.5-Air",
    temperature=0.1,
    top_p=0.95,
    api_key=os.getenv("NEBIUS_API_KEY")
)

model

string

required

Model identifier.

temperature

float

default:"0.7"

Sampling temperature (0.0-2.0). Lower values make output more deterministic.Example: 0.1 for factual tasks, 0.7 for creative tasks

top_p

float

default:"1.0"

Nucleus sampling parameter (0.0-1.0).Example: 0.95

api_key

string

required

Nebius API key.

TypeScript with Nebius

import { createOpenAICompatible } from '@ai-sdk/openai-compatible';

const nebius = createOpenAICompatible({
  name: 'nebius',
  apiKey: process.env.NEBIUS_API_KEY,
  baseURL: 'https://api.tokenfactory.nebius.com/v1'
});

const model = nebius('meta-llama/Meta-Llama-3.1-405B-Instruct');

name

string

required

Provider identifier.

apiKey

string

required

Your Nebius API key from environment variables.

baseURL

string

required

Nebius API endpoint: https://api.tokenfactory.nebius.com/v1

OpenAI Models

Agno Framework

from agno.models.openai import OpenAIChat
import os

model = OpenAIChat(
    id="gpt-4-turbo-preview",
    api_key=os.getenv("OPENAI_API_KEY")
)

OpenAI-Like Providers

For OpenAI-compatible endpoints:

from agno.models.openai.like import OpenAILike
import os

model = OpenAILike(
    id="custom-model-name",
    api_key=os.getenv("API_KEY"),
    base_url="https://api.custom-provider.com/v1"
)

string

required

Model identifier.

api_key

string

required

API key for the provider.

base_url

string

required

Base URL for the API endpoint.

Custom Model Provider (OpenAI Agents SDK)

Implementation

from agents import ModelProvider, Model, OpenAIChatCompletionsModel
from openai import AsyncOpenAI
import os

# Initialize OpenAI client with custom endpoint
client = AsyncOpenAI(
    base_url="https://api.tokenfactory.nebius.com/v1",
    api_key=os.getenv("NEBIUS_API_KEY")
)

class CustomModelProvider(ModelProvider):
    def get_model(self, model_name: str | None) -> Model:
        """
        Returns an OpenAI chat completions model instance.
        
        Args:
            model_name: The name of the model to use, or None for default.
        
        Returns:
            An OpenAIChatCompletionsModel initialized with the model name and client.
        """
        return OpenAIChatCompletionsModel(
            model=model_name,
            openai_client=client
        )

CUSTOM_MODEL_PROVIDER = CustomModelProvider()

Usage with Runner

from agents import Agent, Runner, RunConfig

agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant",
    tools=[send_email]
)

result = await Runner.run(
    agent,
    "Your prompt here",
    run_config=RunConfig(model_provider=CUSTOM_MODEL_PROVIDER)
)

print(result.final_output)

Direct API Calls (TypeScript)

For direct API integration without agent frameworks:

interface NebiusAIService {
  baseURL: string;
  model: string;
  apiKey: string;
}

class NebiusAIService {
  private baseURL: string;
  private model: string;
  private getApiKey: () => string;

  constructor(apiKey?: string, baseURL?: string, model?: string) {
    const resolvedApiKey = apiKey || process.env.NEBIUS_API_KEY || '';
    this.baseURL = baseURL || process.env.NEBIUS_BASE_URL || 'https://api.tokenfactory.nebius.com/v1/';
    this.model = model || process.env.NEBIUS_MODEL || 'Qwen/Qwen3-235B-A22B';
    
    if (!resolvedApiKey) {
      throw new Error('Nebius API key not found');
    }
    
    this.getApiKey = () => resolvedApiKey;
  }

  async callAPI(
    prompt: string,
    systemContent: string,
    options: { maxTokens?: number; temperature?: number } = {}
  ): Promise<any> {
    const requestPayload = {
      model: this.model,
      messages: [
        {
          role: 'system',
          content: systemContent
        },
        {
          role: 'user',
          content: [{ type: 'text', text: prompt }]
        }
      ],
      max_tokens: options.maxTokens || 1000,
      temperature: options.temperature || 0.7
    };

    const response = await fetch(`${this.baseURL}chat/completions`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.getApiKey()}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify(requestPayload)
    });

    if (!response.ok) {
      const errorText = await response.text();
      throw new Error(`Nebius API error: ${response.status} - ${errorText}`);
    }

    return await response.json();
  }
}

Usage Example

const service = new NebiusAIService(
  process.env.NEBIUS_API_KEY,
  'https://api.tokenfactory.nebius.com/v1/',
  'Qwen/Qwen3-235B-A22B'
);

const result = await service.callAPI(
  'Analyze the financial data',
  'You are a professional financial analyst',
  { maxTokens: 1000, temperature: 0.7 }
);

const analysisText = result.choices?.[0]?.message?.content;

Model Selection Guidelines

By Task Type

Simple Tasks (Q&A, summarization)

Nebius(id="Qwen/Qwen3-30B-A3B", api_key=key)
Nebius(id="meta-llama/Meta-Llama-3.1-8B-Instruct", api_key=key)

Complex Reasoning (analysis, problem-solving)

Nebius(id="deepseek-ai/DeepSeek-V3-0324", api_key=key)
Nebius(id="meta-llama/Llama-3.3-70B-Instruct", api_key=key)

Code Generation

Nebius(id="deepseek-ai/DeepSeek-V3-0324", api_key=key)
Nebius(id="Qwen/Qwen3-32B", api_key=key)

Large Context (long documents)

Nebius(id="meta-llama/Meta-Llama-3.1-405B-Instruct", api_key=key)
Nebius(id="Qwen/Qwen3-235B-A22B", api_key=key)

Fast Responses (real-time applications)

Nebius(id="zai-org/GLM-4.5-Air", api_key=key)
Nebius(id="meta-llama/Meta-Llama-3.1-8B-Instruct", api_key=key)

By Cost-Performance

Cost-Effective

Qwen/Qwen3-30B-A3B
meta-llama/Meta-Llama-3.1-8B-Instruct
zai-org/GLM-4.5-Air

Balanced

meta-llama/Llama-3.3-70B-Instruct
Qwen/Qwen3-32B
meta-llama/Meta-Llama-3.1-70B-Instruct

Maximum Capability

meta-llama/Meta-Llama-3.1-405B-Instruct
Qwen/Qwen3-235B-A22B
deepseek-ai/DeepSeek-V3-0324

Configuration Best Practices

1. Use Environment Variables

import os
from dotenv import load_dotenv

load_dotenv()

model = Nebius(
    id="meta-llama/Llama-3.3-70B-Instruct",
    api_key=os.getenv("NEBIUS_API_KEY")  # Never hardcode
)

2. Configure Temperature Appropriately

# For factual tasks - low temperature
llm = ChatNebius(
    model="Qwen/Qwen3-32B",
    temperature=0.1,  # More deterministic
    api_key=os.getenv("NEBIUS_API_KEY")
)

# For creative tasks - higher temperature
llm = ChatNebius(
    model="meta-llama/Llama-3.3-70B-Instruct",
    temperature=0.7,  # More creative
    api_key=os.getenv("NEBIUS_API_KEY")
)

3. Handle API Errors

try:
    model = Nebius(
        id="meta-llama/Llama-3.3-70B-Instruct",
        api_key=os.getenv("NEBIUS_API_KEY")
    )
    response = agent.run("Your query")
except Exception as e:
    print(f"Model error: {e}")

4. Test with Multiple Models

MODELS = [
    "Qwen/Qwen3-30B-A3B",
    "meta-llama/Llama-3.3-70B-Instruct",
    "deepseek-ai/DeepSeek-V3-0324"
]

for model_id in MODELS:
    model = Nebius(id=model_id, api_key=os.getenv("NEBIUS_API_KEY"))
    # Test with your use case

Environment Setup

# .env file
NEBIUS_API_KEY=your_nebius_api_key
NEBIUS_BASE_URL=https://api.tokenfactory.nebius.com/v1/
NEBIUS_MODEL=meta-llama/Llama-3.3-70B-Instruct

# Optional
OPENAI_API_KEY=your_openai_api_key

Common Tools

Agent Configuration

Overview

Nebius (Primary Provider)

Agno Framework

Available Nebius Models

Llama Models

Qwen Models

DeepSeek Models

Other Models

LangChain with Nebius

TypeScript with Nebius

OpenAI Models

Agno Framework

OpenAI-Like Providers

Custom Model Provider (OpenAI Agents SDK)

Implementation

Usage with Runner

Direct API Calls (TypeScript)

Usage Example

Model Selection Guidelines

By Task Type

Simple Tasks (Q&A, summarization)

Complex Reasoning (analysis, problem-solving)

Code Generation

Large Context (long documents)

Fast Responses (real-time applications)

By Cost-Performance

Cost-Effective

Balanced

Maximum Capability

Configuration Best Practices

1. Use Environment Variables

2. Configure Temperature Appropriately

3. Handle API Errors

4. Test with Multiple Models

Environment Setup

Build docs developers (and LLMs) love

Common Tools

Agent Configuration

​Overview

​Nebius (Primary Provider)

​Agno Framework

​Available Nebius Models

​Llama Models

​Qwen Models

​DeepSeek Models

​Other Models

​LangChain with Nebius

​TypeScript with Nebius

​OpenAI Models

​Agno Framework

​OpenAI-Like Providers

​Custom Model Provider (OpenAI Agents SDK)

​Implementation

​Usage with Runner

​Direct API Calls (TypeScript)

​Usage Example

​Model Selection Guidelines

​By Task Type

​Simple Tasks (Q&A, summarization)

​Complex Reasoning (analysis, problem-solving)

​Code Generation

​Large Context (long documents)

​Fast Responses (real-time applications)

​By Cost-Performance

​Cost-Effective

​Balanced

​Maximum Capability

​Configuration Best Practices

​1. Use Environment Variables

​2. Configure Temperature Appropriately

​3. Handle API Errors

​4. Test with Multiple Models

​Environment Setup

​Related Resources

Build docs developers (and LLMs) love

Overview

Nebius (Primary Provider)

Agno Framework

Available Nebius Models

Llama Models

Qwen Models

DeepSeek Models

Other Models

LangChain with Nebius

TypeScript with Nebius

OpenAI Models

Agno Framework

OpenAI-Like Providers

Custom Model Provider (OpenAI Agents SDK)

Implementation

Usage with Runner

Direct API Calls (TypeScript)

Usage Example

Model Selection Guidelines

By Task Type

Simple Tasks (Q&A, summarization)

Complex Reasoning (analysis, problem-solving)

Code Generation

Large Context (long documents)

Fast Responses (real-time applications)

By Cost-Performance

Cost-Effective

Balanced

Maximum Capability

Configuration Best Practices

1. Use Environment Variables

2. Configure Temperature Appropriately

3. Handle API Errors

4. Test with Multiple Models

Environment Setup

Related Resources