Clients configure how BAML calls LLM providers, including authentication, model selection, retry policies, and provider-specific options.
Syntax
client<llm> ClientName {
provider "provider-name"
retry_policy PolicyName // Optional
options {
// Provider-specific configuration
}
}
Basic Examples
OpenAI
client<llm> GPT4 {
provider openai
options {
model gpt-4o
api_key env.OPENAI_API_KEY
}
}
Anthropic
client<llm> Claude {
provider anthropic
options {
model claude-sonnet-4-5-20250929
api_key env.ANTHROPIC_API_KEY
max_tokens 2048
}
}
Google AI
client<llm> Gemini {
provider google-ai
options {
model gemini-2.5-flash
api_key env.GOOGLE_API_KEY
}
}
Client Components
Declares an LLM client configuration.
Unique name for this client. Use PascalCase convention.client<llm> GPT4Turbo { }
client<llm> ClaudeHaiku { }
The LLM provider name. See Providers for supported values.provider openai
provider anthropic
provider "vertex-ai"
Optional reference to a retry policy configuration.retry_policy ExponentialBackoff
Provider-specific configuration block. Contents vary by provider.
Providers
BAML supports multiple LLM providers:
| Provider | Value | Documentation |
|---|
| OpenAI | openai | OpenAI → |
| Anthropic | anthropic | Anthropic → |
| Google AI | google-ai | Google AI → |
| Vertex AI | vertex-ai | Vertex AI → |
| AWS Bedrock | aws-bedrock | AWS Bedrock → |
| Azure OpenAI | azure-openai | Azure OpenAI → |
| OpenRouter | openrouter | OpenRouter → |
| Ollama | ollama | Ollama → |
| OpenAI Generic | openai-generic | For OpenAI-compatible APIs |
| Fallback | baml-fallback | Fallback Strategy → |
| Round Robin | baml-round-robin | Round Robin → |
Common Options
While options vary by provider, these are commonly supported:
The model identifier.model gpt-4o
model "claude-3-opus-20240229"
API authentication key. Typically references an environment variable.api_key env.OPENAI_API_KEY
api_key env.ANTHROPIC_API_KEY
Override the default API endpoint.base_url "https://api.openai.com/v1"
base_url env.CUSTOM_ENDPOINT
Maximum tokens in the response.max_tokens 2048
max_tokens 4096
Some models (like O1, O3) don’t support max_tokens. Use max_completion_tokens instead or set to null.
Sampling temperature (0.0 to 2.0).temperature 0.7
temperature 0.0 // Deterministic
Custom HTTP headers.headers {
"X-Custom-Header" "value"
"Authorization" env.CUSTOM_AUTH
}
Shorthand Syntax
For simple cases, use inline provider/model syntax:
function MyFunction(input: string) -> string {
client "openai/gpt-4o"
prompt #"..."#
}
This is equivalent to:
client<llm> AutoGeneratedClient {
provider openai
options {
model gpt-4o
api_key env.OPENAI_API_KEY // Uses default
}
}
function MyFunction(input: string) -> string {
client AutoGeneratedClient
prompt #"..."#
}
Retry Policies
Define retry behavior for failed requests:
retry_policy ExponentialBackoff {
max_retries 3
strategy {
type exponential_backoff
}
}
retry_policy ConstantDelay {
max_retries 5
strategy {
type constant_delay
delay_ms 1000
}
}
client<llm> ResilientClient {
provider openai
retry_policy ExponentialBackoff
options {
model gpt-4o
api_key env.OPENAI_API_KEY
}
}
See Retry Policies for details.
Advanced Examples
Azure OpenAI
client<llm> AzureGPT {
provider azure-openai
options {
resource_name "my-resource"
deployment_id "gpt-4o-deployment"
api_version "2024-02-01"
api_key env.AZURE_OPENAI_API_KEY
}
}
AWS Bedrock
client<llm> BedrockClaude {
provider aws-bedrock
options {
model "anthropic.claude-3-5-sonnet-20240620-v1:0"
region "us-east-1"
inference_configuration {
max_tokens 2048
}
}
}
Vertex AI
client<llm> VertexGemini {
provider vertex-ai
options {
model gemini-2.5-flash
location us-central1
credentials env.GOOGLE_APPLICATION_CREDENTIALS_CONTENT
}
}
OpenAI with Custom Settings
client<llm> CustomGPT {
provider openai
options {
model gpt-4o
api_key env.OPENAI_API_KEY
temperature 0.7
max_tokens 2048
top_p 0.9
frequency_penalty 0.5
presence_penalty 0.5
}
}
Anthropic with Prompt Caching
client<llm> ClaudeWithCaching {
provider anthropic
options {
model claude-3-haiku-20240307
api_key env.ANTHROPIC_API_KEY
max_tokens 1000
allowed_role_metadata ["cache_control"]
headers {
"anthropic-beta" "prompt-caching-2024-07-31"
}
}
}
OpenRouter
client<llm> OpenRouterClient {
provider openrouter
options {
model "anthropic/claude-3-haiku"
api_key env.OPENROUTER_API_KEY
headers {
"X-Title" "My App"
"HTTP-Referer" "https://myapp.com"
}
}
}
Ollama (Local)
client<llm> LocalLlama {
provider ollama
options {
model llama3.1
base_url "http://localhost:11434"
}
}
Strategy Clients
Fallback Strategy
Try clients in sequence until one succeeds:
client<llm> ResilientClient {
provider baml-fallback
options {
strategy [
GPT4Turbo
GPT35
Claude
]
}
}
Round Robin Strategy
Distribute requests across clients:
client<llm> LoadBalanced {
provider baml-round-robin
options {
start 0
strategy [
GPT4
Claude
Gemini
]
}
}
Configure how media (images, audio, etc.) are sent to models:
client<llm> CustomMediaHandling {
provider openai
options {
model gpt-4o
api_key env.OPENAI_API_KEY
media_url_handler {
image "send_base64" // Convert URLs to base64
audio "send_url" // Send URL directly
pdf "send_base64" // Convert to base64
video "send_url" // Send URL directly
}
}
}
Options:
send_url: Pass URL directly to model
send_base64: Download and convert to base64
send_url_add_mime_type: Send URL with MIME type header
Reasoning Models
OpenAI O-series
// O1/O3/O4 models don't support max_tokens
client<llm> OpenAIO1 {
provider openai
options {
model "o4-mini"
api_key env.OPENAI_API_KEY
max_tokens null // Must be null or omitted
max_completion_tokens 2048 // Use this instead
}
}
Anthropic with Extended Thinking
client<llm> ClaudeThinking {
provider anthropic
options {
model "claude-3-7-sonnet-20250219"
api_key env.ANTHROPIC_API_KEY
max_tokens 2048
thinking {
type "enabled"
budget_tokens 1024
}
}
}
Gemini with Thinking
client<llm> GeminiThinking {
provider google-ai
options {
model "gemini-2.5-pro"
api_key env.GOOGLE_API_KEY
generationConfig {
thinkingConfig {
thinkingBudget 1024
includeThoughts true
}
}
}
}
Runtime Client Selection
Override the client at runtime:
from baml_client import b
result = await b.MyFunction(
input="data",
baml_options={"client": "GPT4Turbo"}
)
Useful for:
- A/B testing
- Gradual rollouts
- User-specific routing
- Cost optimization
See Client Registry for details.
Environment Variables
Reference environment variables with env. prefix:
options {
api_key env.OPENAI_API_KEY
base_url env.CUSTOM_BASE_URL
organization env.OPENAI_ORG_ID
}
Default environment variable names by provider:
| Provider | Default Variable |
|---|
| OpenAI | OPENAI_API_KEY |
| Anthropic | ANTHROPIC_API_KEY |
| Google AI | GOOGLE_API_KEY |
| AWS Bedrock | AWS credentials chain |
| Azure OpenAI | AZURE_OPENAI_API_KEY |
| OpenRouter | OPENROUTER_API_KEY |
Best Practices
- Naming: Use descriptive PascalCase names (e.g.,
GPT4Turbo, not client1)
- Credentials: Always use environment variables for API keys
- Retry Policies: Add retry policies for production clients
- Fallbacks: Use fallback strategies for critical operations
- Testing: Create separate clients for development/testing
- Documentation: Add comments explaining client purpose
- Defaults: Leverage provider defaults when appropriate
- Monitoring: Use different clients to track usage by model/provider