openai-generic provider supports all APIs that use OpenAI’s request and response formats, including Ollama, OpenRouter, Groq, HuggingFace, Together AI, and many others.
Quick Start
Supported Providers
A non-exhaustive list of providers compatible withopenai-generic:
| Provider | Base URL | Documentation |
|---|---|---|
| Ollama | http://localhost:11434/v1 | Ollama |
| OpenRouter | https://openrouter.ai/api/v1 | OpenRouter |
| Groq | https://api.groq.com/openai/v1 | Groq |
| Together AI | https://api.together.xyz/v1 | Together |
| Cerebras | https://api.cerebras.ai/v1 | Cerebras |
| Hugging Face | https://api-inference.huggingface.co/models/<model> | HuggingFace |
| LM Studio | http://localhost:1234/v1 | LM Studio |
| vLLM | Custom | vLLM |
Configuration Options
BAML-Specific Options
The base URL for the API endpoint.
Used to build the
Authorization header: Authorization: Bearer $api_keyIf not set or set to an empty string, the Authorization header will not be sent. This is useful for local providers like Ollama that don’t require authentication.Additional headers to send with requests.
The model name in the provider’s expected format. The exact syntax depends on your provider’s documentation.Examples:
- OpenAI:
"gpt-4o" - Ollama:
"llama3" - OpenRouter:
"openai/gpt-4o-mini"
Model Parameters
These parameters are passed directly to the provider API. Common parameters (support varies by provider):temperature- Controls randomnessmax_tokens- Maximum tokens to generatetop_p- Nucleus sampling parameterfrequency_penalty- Reduces repetitionpresence_penalty- Encourages new topics
Provider Examples
Ollama
Ollama provides local LLM inference with OpenAI-compatible endpoints.| Model | Description |
|---|---|
llama4 | Latest Meta Llama with enhanced reasoning |
llama3.3 | Enhanced Llama 3 with improved performance |
qwen2 | Alibaba’s large language model series |
phi3 | Microsoft’s lightweight 3B/14B models |
mistral | Mistral AI’s 7B model |
gemma | Google DeepMind’s lightweight models |
OpenRouter
OpenRouter provides unified access to 300+ models from multiple providers.provider/model-name format:
openai/gpt-4o-minianthropic/claude-3.5-sonnetgoogle/gemini-2.0-flash-001meta-llama/llama-3.1-70b-instruct
:nitro for high-throughput):
Groq
Groq provides fast inference for open-source models.Together AI
Together AI provides access to open-source models with fast inference.LM Studio
LM Studio provides local model inference with a desktop application.Features
- Streaming: Supported (depends on provider)
- Multimodal: Support varies by provider and model
- Local Inference: Works with Ollama, LM Studio, and vLLM
- Cloud Providers: Works with OpenRouter, Groq, Together AI, and more
Do Not Set
BAML automatically constructs this from your prompt.
BAML automatically sets this based on how you call the client in your code.