Skip to main content

Switching Between LLM Models

BAML supports getting structured output from all major LLM providers and OpenAI-API compatible open-source models. This guide shows you how to switch between different models and providers.
BAML can help you get structured output from any Open-Source model, with better performance than other techniques, even when it’s not officially supported via a Tool-Use API or fine-tuned for it. Read more about Schema-Aligned Parsing.

Quick Switching with Inline Syntax

The fastest way to switch models is using the inline client syntax:
function MakeHaiku(topic: string) -> string {
  client "openai/gpt-5-mini"
  prompt #"
    Write a haiku about {{ topic }}.
  "#
}
This syntax assumes you have the appropriate API key set in your environment:
  • openai/* requires OPENAI_API_KEY
  • anthropic/* requires ANTHROPIC_API_KEY
  • google-ai/* requires GOOGLE_API_KEY

Supported Inline Providers

// OpenAI
client "openai/gpt-5-mini"
client "openai/gpt-4o"

// OpenAI Responses API (beta)
client "openai-responses/gpt-5-mini"

// Anthropic
client "anthropic/claude-sonnet-4-20250514"
client "anthropic/claude-opus-4-1-20250805"

// Google AI
client "google-ai/gemini-2.0-flash-exp"

Named Clients for Customization

For more control over model parameters, define named clients:
client<llm> MyClient {
  provider "openai"
  options {
    model "gpt-5-mini"
    api_key env.OPENAI_API_KEY
    temperature 0.0
    base_url "https://my-custom-endpoint.com/v1"
    headers {
      "anthropic-beta" "prompt-caching-2024-07-31"
    }
  }
}

function MakeHaiku(topic: string) -> string {
  client MyClient
  prompt #"
    Write a haiku about {{ topic }}.
  "#
}

Common Client Options

client<llm> GPT4 {
  provider "openai"
  options {
    model "gpt-4"
    api_key env.OPENAI_API_KEY
    temperature 0.7
    max_tokens 1000
    top_p 0.9
  }
}

Runtime Client Selection

Switch models at runtime using the Client Registry:
from baml_client import b
from baml_py import ClientRegistry
import os

async def run():
    cr = ClientRegistry()
    
    # Add a new client at runtime
    cr.add_llm_client(
        name='GPT4',
        provider='openai',
        options={
            "model": "gpt-4",
            "temperature": 0.7,
            "api_key": os.environ.get('OPENAI_API_KEY')
        }
    )
    
    # Use the runtime client
    cr.set_primary('GPT4')
    result = await b.MakeHaiku("mountains", {"client_registry": cr})

Simple Client Override

For quick overrides without creating a full ClientRegistry:
# Python
result = await b.MakeHaiku("mountains", baml_options={"client": "GPT4"})
// TypeScript
const result = await b.MakeHaiku("mountains", { client: "GPT4" })

Comparing Model Performance

Test multiple models with the same prompt:
function ExtractData(input: string) -> DataSchema {
  client "openai/gpt-5-mini"  // Default
  prompt #"
    Extract information from: {{ input }}
    {{ ctx.output_format }}
  "#
}

test CompareGPT4 {
  functions [ExtractData]
  args {
    input "Sample text to extract..."
  }
}
Then change the client and re-run the test:
function ExtractData(input: string) -> DataSchema {
  client "anthropic/claude-sonnet-4-20250514"  // Now testing Claude
  prompt #"
    Extract information from: {{ input }}
    {{ ctx.output_format }}
  "#
}

Model Selection Best Practices

1
Start with a Fast Model
2
Begin development with a fast, inexpensive model like gpt-5-mini to iterate quickly on your prompts.
3
Test Multiple Models
4
Once your prompt works, test with different models to find the best balance of:
5
  • Accuracy: Does it extract the right data?
  • Speed: How fast does it respond?
  • Cost: What’s the token cost per request?
  • 6
    Use Fallbacks for Production
    7
    Combine multiple models for reliability (see retries and fallbacks).
    8
    Monitor Performance
    9
    Track model performance in production using BAML Studio or your own observability tools.

    Provider-Specific Features

    OpenAI Responses API

    Use the Responses API for improved structured outputs:
    function ExtractData(input: string) -> DataSchema {
      client "openai-responses/gpt-5-mini"
      prompt #"
        Extract: {{ input }}
        {{ ctx.output_format }}
      "#
    }
    

    Anthropic Prompt Caching

    Enable prompt caching for cost savings:
    client<llm> CachedClaude {
      provider "anthropic"
      options {
        model "claude-sonnet-4-20250514"
        api_key env.ANTHROPIC_API_KEY
        headers {
          "anthropic-beta" "prompt-caching-2024-07-31"
        }
      }
    }
    

    Custom Headers

    Add provider-specific headers:
    client<llm> CustomClient {
      provider "openai"
      options {
        model "gpt-4"
        api_key env.OPENAI_API_KEY
        headers {
          "X-Custom-Header" "value"
          "Organization" env.OPENAI_ORG_ID
        }
      }
    }
    

    Next Steps

    Build docs developers (and LLMs) love