Skip to main content

The Problem with Current Approaches

Building reliable AI applications today feels like writing raw HTML in Python strings:
# The old way - error-prone and hard to maintain
def home():
    return "<button onclick=\"() => alert(\\\"hello!\\\")\">Click</button>"
Currently, developers craft LLM prompts with:
  • F-strings and string concatenation: Hard to read, easy to break
  • No type safety: Runtime errors from schema mismatches
  • Slow iteration: Must execute code to test prompt changes
  • Manual JSON schemas: Tedious to maintain and sync with code
The situation is even worse when you need structured outputs. With Python and Pydantic, you must:
  1. Set up a complete Python environment
  2. Execute your entire application
  3. Wait for the LLM response
  4. Fix the prompt
  5. Repeat
This process can take 2+ minutes per iteration. If you can only test 10 ideas in 20 minutes, you’re severely limited.

The BAML Solution

BAML turns prompt engineering into schema engineering - where you focus on the structure of your data, not string manipulation.

1. Test 10x Faster

BAML’s VSCode playground lets you test prompts directly in your editor:
  • See the full rendered prompt with multimodal assets
  • View the exact API request being sent
  • Run tests in parallel for even faster iteration
  • Test in 5 seconds instead of 2 minutes
With BAML, you can test 240 ideas in 20 minutes instead of just 10.
Speed matters: Faster iteration = more experiments = better prompts = better AI applications

2. Fully Type-Safe

BAML generates native types for your language:
from baml_client import b
from baml_client.types import Resume, Education

# Autocomplete works!
resume: Resume = b.ExtractResume(text)
school: str = resume.education[0].school
year: int = resume.education[0].year
Even streaming is type-safe:
stream = b.stream.ExtractResume(text)
for partial in stream:  # partial: PartialResume
    if partial.name:  # Type-safe optional fields
        print(f"Name: {partial.name}")

3. Works with Any Model

Switch models in seconds, not hours:
function ExtractResume(text: string) -> Resume {
- client "openai/gpt-4o"
+ client "anthropic/claude-3-5-sonnet"
  prompt #"..."
}
BAML supports:
  • OpenAI (GPT-4, GPT-4o, O1, O3)
  • Anthropic (Claude 3, Claude 3.5)
  • Google (Gemini, Vertex AI)
  • AWS (Bedrock)
  • Azure OpenAI
  • Any OpenAI-compatible API (Ollama, OpenRouter, VLLM, LMStudio, TogetherAI, Deepseek, etc.)

4. Reliable Structured Outputs

BAML’s SAP (Schema-Aligned Parsing) algorithm works even when models don’t support native tool-calling:
  • Handles markdown within JSON
  • Supports chain-of-thought before answers
  • Works on Day 1 of new model releases
  • No need to check if a model supports parallel tool calls, recursive schemas, anyOf, oneOf, etc.
BAML’s structured outputs even outperform OpenAI with their own models

5. Built-in Reliability

Add retries, fallbacks, and load balancing with simple configuration:
client<llm> MyClient {
  provider openai
  options {
    model gpt-4o
  }
  
  // Automatic retries
  retry_policy {
    max_retries 3
    strategy exponential_backoff
  }
}

// Fallback to another model
function Extract(text: string) -> Data {
  client MyClient | ClaudeBackup
  prompt #"..."
}

// Round-robin across models
function Extract(text: string) -> Data {
  client RoundRobin<MyClient, Claude, Gemini>
  prompt #"..."
}

6. Maintainable Code

Compare the readability:
# Hard to read, easy to break
prompt = f"""
Extract the following data from this resume:
{{
  \"name\": \"string\",
  \"education\": [{{
    \"school\": \"string\",
    \"degree\": \"string\",
    \"year\": \"number\"
  }}]
}}

Resume:
{resume_text}

Respond with valid JSON only.
"""

result = json.loads(client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
).choices[0].message.content)

BAML vs Other Frameworks

vs Pydantic AI / Instructor

FeatureBAMLPydantic/Instructor
LanguageAny (Python, TS, Ruby, Go)Python only
TestingBuilt-in playgroundRun full app
Streaming typesFully type-safeManual handling
Prompt previewsLive in editorNone
Model supportAll providersLimited
SAP parsingYesNo

vs LangChain / LlamaIndex

FeatureBAMLLangChain/LlamaIndex
FocusStructured outputsChains & RAG
Type safetyFullPartial
Testing speed5 seconds2+ minutes
Learning curveMinimalSteep
Vendor lock-inNoneFramework-specific
BAML works great alongside LangChain or LlamaIndex - use BAML for structured outputs and other frameworks for orchestration.

vs Raw API Calls

FeatureBAMLRaw APIs
Type safetyGeneratedManual
Schema syncAutomaticManual
TestingBuilt-in playgroundCustom setup
Retry logicBuilt-inManual
Streaming UIType-safeManual
Model switchingChange 1 lineRewrite code

Why a New Language?

Just like we moved from string concatenation in backend code:
def home():
    return "<button onclick=\"() => alert(\\\"hello!\\\")\">Click</button>"
To JSX/TSX for better abstraction:
function Home() {
  return <button onClick={() => setCount(prev => prev + 1)}>
           {count} clicks!
         </button>
}
BAML provides the right abstraction for prompts: Structured - Not just strings
Type-safe - Catch errors before runtime
Testable - Fast iteration cycle
Maintainable - Easy to read and modify
Language-agnostic - Works with your stack
New syntax can be incredible at expressing new ideas. The goal of BAML is to give you the expressiveness of English, but the structure of code.

Design Philosophy

  1. Avoid invention when possible
    • Prompts need versioning → use Git
    • Prompts need storage → use filesystems
  2. Any file editor and terminal should work
    • No special tools required
    • Works with your existing workflow
  3. Be fast
    • Built in Rust for maximum performance
    • So fast, you can’t even tell it’s there
  4. Easy to understand
    • A first-year university student should be able to read BAML
    • Simple syntax, powerful results

100% Open Source & Private

  • Apache 2.0 License - Use it freely in commercial projects
  • No telemetry - Zero network requests beyond your explicit model calls
  • Not used for training - Your prompts and data stay private
  • Local-first - Works completely offline (except for model calls)
  • Git-friendly - BAML files diff beautifully in version control

Production Ready

BAML is used by many companies in production:
  • Weekly updates and improvements
  • Stable API with semantic versioning
  • Active community on Discord
  • Comprehensive documentation
  • Example projects and templates

Common Questions

Do I need to write my whole app in BAML?

No! Only write your prompts in BAML. BAML generates client code for Python, TypeScript, Ruby, Go, and REST APIs that integrates seamlessly with your existing application.

Is BAML production-ready?

Yes! Many companies use BAML in production. We ship updates weekly and maintain backward compatibility.

Can I use BAML with my existing framework?

Yes! BAML works great alongside LangChain, LlamaIndex, or any other framework. Use BAML for structured outputs and your preferred framework for orchestration.

What if I need to switch models at runtime?

BAML has you covered! Check out the Client Registry to dynamically select models based on runtime conditions.

Get Started

Quick Start

Install BAML and create your first function in 5 minutes

Try Online

Test BAML in your browser without installing anything

Examples

See BAML in action with interactive examples

Join Discord

Get help from the community and BAML team

Build docs developers (and LLMs) love