Skip to main content
The Models API provides information about available LLM models from all configured providers, including capabilities, pricing, and context limits.

List models

Retrieve all available models from configured providers.

Endpoint

GET /api/models

Parameters

provider
string
Filter models by provider (openai, anthropic, gemini, bedrock)
capability
string
Filter by model capability (chat, embedding, image_generation)

Response

models
array
Array of model objects

Example

cURL
curl http://localhost:9090/api/models
Python
import requests

response = requests.get('http://localhost:9090/api/models')
models = response.json()

# List all chat models
chat_models = [m for m in models['models'] if 'chat' in m['capabilities']]
for model in chat_models:
    print(f"{model['id']} - {model['provider']}")
    print(f"  Context: {model['context_window']} tokens")
    print(f"  Price: ${model['pricing']['input']}/1M input, ${model['pricing']['output']}/1M output")

Filter by provider

Get models from a specific provider:
curl "http://localhost:9090/api/models?provider=anthropic"
Returns only Anthropic Claude models.

Filter by capability

Get models with specific capabilities:
# Chat models only
curl "http://localhost:9090/api/models?capability=chat"

# Embedding models
curl "http://localhost:9090/api/models?capability=embedding"

# Image generation models
curl "http://localhost:9090/api/models?capability=image_generation"

Using models in requests

Use the model ID from the API in your requests:
curl http://localhost:9090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
vLLora automatically routes to the correct provider based on the model ID.

Model capabilities

Chat models

Models with chat capabilities support:
  • Conversational interfaces
  • Multi-turn dialogues
  • System prompts
  • Tool/function calling (if supported)
Example chat models:
  • gpt-4o, gpt-4o-mini (OpenAI)
  • claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022 (Anthropic)
  • gemini-2.0-flash-exp, gemini-1.5-pro (Google)
  • anthropic.claude-3-5-sonnet-20241022-v2:0 (AWS Bedrock)

Embedding models

Models that generate vector embeddings:
  • text-embedding-3-small, text-embedding-3-large (OpenAI)
  • text-embedding-004 (Google)

Image generation models

Models that generate images from text:
  • dall-e-3, dall-e-2 (OpenAI)

Pricing information

The pricing field shows costs per 1 million tokens:
{
  "id": "gpt-4o",
  "pricing": {
    "input": 2.50,   // $2.50 per 1M input tokens
    "output": 10.00  // $10.00 per 1M output tokens
  }
}
Use this to estimate costs before making requests.

Context windows

The context_window field indicates the maximum total tokens (input + output):
  • gpt-4o: 128,000 tokens
  • claude-3-5-sonnet-20241022: 200,000 tokens
  • gemini-2.0-flash-exp: 1,000,000 tokens
Ensure your requests fit within the model’s context window.

CLI command

List models from the command line:
vllora list
This displays all available models in a formatted table.

Sync models

Update the model database from provider APIs:
# Sync all models
vllora sync --models

# Sync specific provider
vllora sync --models --providers
Model information is embedded in vLLora at build time for fast startup. Use sync to update with the latest models from provider APIs.

Best practices

  • Use smaller/faster models (gpt-4o-mini, claude-3-5-haiku) for simple tasks
  • Use larger models (gpt-4o, claude-3-5-sonnet) for complex reasoning
  • Check context window if you have long conversations
Check model pricing before deploying. Cost differences can be significant:
  • gpt-4o-mini: 0.15/0.15/0.60 per 1M tokens
  • gpt-4o: 2.50/2.50/10.00 per 1M tokens
Not all models support all features. Check the features array:
  • Tool calling support varies by model
  • Vision capabilities are model-specific
  • JSON mode isn’t universal
Run vllora sync --models periodically to get new models and updated pricing.

Next steps

Chat Completions

Use models in chat completion requests

Embeddings

Generate embeddings with embedding models

Image Generation

Create images with DALL-E models

Providers

Learn about provider support

Build docs developers (and LLMs) love