List models
Retrieve all available models from configured providers.Endpoint
Parameters
Filter models by provider (openai, anthropic, gemini, bedrock)
Filter by model capability (chat, embedding, image_generation)
Response
Array of model objects
Example
cURL
Python
Filter by provider
Get models from a specific provider:Filter by capability
Get models with specific capabilities:Using models in requests
Use the model ID from the API in your requests:Model capabilities
Chat models
Models with chat capabilities support:- Conversational interfaces
- Multi-turn dialogues
- System prompts
- Tool/function calling (if supported)
gpt-4o,gpt-4o-mini(OpenAI)claude-3-5-sonnet-20241022,claude-3-5-haiku-20241022(Anthropic)gemini-2.0-flash-exp,gemini-1.5-pro(Google)anthropic.claude-3-5-sonnet-20241022-v2:0(AWS Bedrock)
Embedding models
Models that generate vector embeddings:text-embedding-3-small,text-embedding-3-large(OpenAI)text-embedding-004(Google)
Image generation models
Models that generate images from text:dall-e-3,dall-e-2(OpenAI)
Pricing information
The pricing field shows costs per 1 million tokens:Context windows
Thecontext_window field indicates the maximum total tokens (input + output):
gpt-4o: 128,000 tokensclaude-3-5-sonnet-20241022: 200,000 tokensgemini-2.0-flash-exp: 1,000,000 tokens
CLI command
List models from the command line:Sync models
Update the model database from provider APIs:Model information is embedded in vLLora at build time for fast startup. Use
sync to update with the latest models from provider APIs.Best practices
Choose the right model for your use case
Choose the right model for your use case
- Use smaller/faster models (gpt-4o-mini, claude-3-5-haiku) for simple tasks
- Use larger models (gpt-4o, claude-3-5-sonnet) for complex reasoning
- Check context window if you have long conversations
Monitor costs
Monitor costs
Check model pricing before deploying. Cost differences can be significant:
- gpt-4o-mini: 0.60 per 1M tokens
- gpt-4o: 10.00 per 1M tokens
Verify capabilities
Verify capabilities
Not all models support all features. Check the
features array:- Tool calling support varies by model
- Vision capabilities are model-specific
- JSON mode isn’t universal
Keep model data updated
Keep model data updated
Run
vllora sync --models periodically to get new models and updated pricing.Next steps
Chat Completions
Use models in chat completion requests
Embeddings
Generate embeddings with embedding models
Image Generation
Create images with DALL-E models
Providers
Learn about provider support