baml test

The baml test command discovers and executes test cases defined in your BAML files, with support for pattern-based filtering, parallel execution, and comprehensive assertion reporting.

Usage

baml test [OPTIONS]

Options

--from

string

default:"./baml_src"

Path to the directory containing your BAML source files.

--list

boolean

default:"false"

List all selected tests without running them. Useful for verifying which tests will be executed with your filter patterns.

-i, --include

string[]

Include specific functions or tests (can be specified multiple times). If not provided, all tests are selected.Supports powerful wildcard patterns for flexible test selection.

-x, --exclude

string[]

Exclude specific functions or tests (can be specified multiple times). Takes precedence over --include filters.Uses the same pattern syntax as --include.

--parallel

number

default:"10"

Number of tests to run concurrently. Increase for faster execution on multi-core systems, or set to 1 for sequential execution.

--pass-if-no-tests

boolean

default:"false"

Exit with success status even when no tests match the filter criteria. Useful in CI pipelines with conditional test execution.

--require-human-eval

boolean

default:"true"

Fail the test run if any tests require human evaluation (tests without assertions). Set to false to allow human-eval tests to pass.

--dotenv

boolean

default:"true"

Load environment variables from a .env file. Disable with --no-dotenv.

--dotenv-path

string

Path to a custom environment file. If not specified, looks for .env in the current directory.

--features

string[]

Enable specific features (can be specified multiple times).Available features:

beta - Enable beta features and suppress experimental warnings
display_all_warnings - Show all warnings in CLI output

Test Filtering Patterns

The --include and --exclude options support powerful pattern matching:

Pattern Syntax

* - Wildcard matching any characters within a name segment
FunctionName::TestName - Match a specific test in a specific function
FunctionName:: - Match all tests in a function
::TestName - Match tests with a specific name across all functions
Multiple patterns can be combined

Filter Precedence

When both --include and --exclude are specified:

Tests matching any --include pattern are selected
Tests matching any --exclude pattern are removed from selection
Exclusions always take precedence over inclusions

Examples

Basic Testing

Run all tests in the project:

baml test

Run tests from a specific directory:

baml test --from /path/to/my/baml_src

Test Discovery

List all available tests:

baml test --list

List tests matching a pattern:

baml test --list -i "ExtractResume::"

Pattern-Based Filtering

Run all tests for a specific function:

baml test -i "ExtractResume::"

Run a specific test:

baml test -i "ExtractResume::TestBasicResume"

Run tests matching a wildcard pattern:

baml test -i "Extract*::"

Run tests for multiple functions:

baml test -i "ExtractResume::" -i "ClassifyMessage::"

Exclude specific tests:

baml test -x "ExtractResume::TestSlowCase"

Combine include and exclude (exclude takes precedence):

baml test -i "Extract*::" -x "*::TestSlow*"

Run tests matching a specific test name across all functions:

baml test -i "::TestEdgeCase"

Parallel Execution

Run tests sequentially (useful for debugging):

baml test --parallel 1

Run with high parallelism:

baml test --parallel 20

Default parallel execution (10 tests concurrently):

baml test

Environment Configuration

Use default .env file:

baml test

Disable environment file loading:

baml test --no-dotenv

Use a custom environment file:

baml test --dotenv-path .env.test

Set environment variables directly:

OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-... baml test

CI/CD Integration

Fail fast on tests requiring human evaluation:

baml test --require-human-eval

Allow human-eval tests to pass:

baml test --no-require-human-eval

Controlled parallelism for CI environments:

baml test --parallel 5

Handle empty test selection gracefully:

baml test -i "OptionalFeature::" --pass-if-no-tests

Exit Codes

The command returns different exit codes based on test results:

Exit Code	Meaning
`0`	All tests passed
`1`	One or more test failures occurred
`2`	Tests require human evaluation
`3`	Test execution was cancelled (Ctrl+C)
`4`	No tests were found/selected

Test Definition

Tests are defined in your BAML files using the test block:

function ExtractResume(resume: string) -> Resume {
  client GPT4
  prompt #"
    Extract resume information from the following text:
    
    {{ resume }}
  "#
}

test TestBasicResume {
  functions [ExtractResume]
  args {
    resume "John Doe\[email protected]\nSoftware Engineer"
  }
  @@assert({{ this.name == "John Doe" }})
  @@assert({{ this.email == "[email protected]" }})
}

Assertions

Use @@assert() blocks to verify test results:

// Simple equality
@@assert({{ this.name == "Expected Name" }})

// Null checks
@@assert({{ this.email != null }})

// Numeric comparisons
@@assert({{ this.age >= 18 }})

// String operations
@@assert({{ this.title.contains("Engineer") }})

// Array length
@@assert({{ this.skills.length > 0 }})

Multiple Test Cases

Define multiple tests for the same function:

test TestValidEmail {
  functions [ValidateEmail]
  args {
    email "[email protected]"
  }
  @@assert({{ this.is_valid == true }})
}

test TestInvalidEmail {
  functions [ValidateEmail]
  args {
    email "invalid-email"
  }
  @@assert({{ this.is_valid == false }})
}

Troubleshooting

No tests found

Error: Exit code 4 - “No tests were found to run” Solution:

Verify tests are defined in your BAML files

Check filter patterns aren’t too restrictive:

baml test --list  # See all available tests

Ensure you’re pointing to the correct source directory:
```
baml test --from ./baml_src
```

Test failures

Error: Exit code 1 - Test assertions failing Solution:

Review the assertion output to see which assertions failed

Run a single test for easier debugging:

baml test -i "FunctionName::FailingTest"

Check LLM responses are deterministic enough for your assertions
Consider using looser assertions or testing patterns rather than exact matches

Human evaluation required

Error: Exit code 2 - “Tests require human evaluation” Cause: Some tests don’t have @@assert() blocks Solution:

Add assertions to all tests for automated validation
Or allow human-eval tests in your workflow:
```
baml test --no-require-human-eval
```

API key errors

Error: “Missing API key for client GPT4” Solution: Set required API keys in your environment:

# Create .env file
echo "OPENAI_API_KEY=sk-..." > .env
echo "ANTHROPIC_API_KEY=sk-..." >> .env

# Or set directly
export OPENAI_API_KEY=sk-...
baml test

Rate limiting

Issue: Tests failing due to rate limits Solution: Reduce parallelism to slow down request rate:

baml test --parallel 2

Slow test execution

Issue: Tests taking too long Solution:

Increase parallelism:
```
baml test --parallel 20
```
Run a subset of tests during development:
```
baml test -i "QuickFunction::"
```
Exclude slow tests:
```
baml test -x "*::TestSlow*"
```

Cancelling test runs

Press Ctrl+C to gracefully cancel running tests. The command will:

Stop starting new tests
Wait for in-flight tests to complete
Return exit code 3

Best Practices

Organizing Tests

Use descriptive, hierarchical names:

test ExtractResume::ValidInput {}
test ExtractResume::EmptyInput {}
test ExtractResume::MalformedInput {}

test ClassifyIntent::SimpleQuestion {}
test ClassifyIntent::ComplexQuery {}

This enables powerful filtering:

baml test -i "ExtractResume::"  # All extraction tests
baml test -i "*::Valid*"        # All valid input tests

Assertion Strategy

Test patterns, not exact strings: LLM outputs vary

// Good
@@assert({{ this.category in ["question", "statement"] }})

// Brittle
@@assert({{ this.summary == "The user asks about pricing." }})

Test structure and types: Verify the schema

@@assert({{ this.skills.length > 0 }})
@@assert({{ this.email != null }})

Use multiple assertions: Break down complex validations

@@assert({{ this.is_valid == true }})
@@assert({{ this.confidence > 0.7 }})
@@assert({{ this.category != null }})

Development Workflow

Write tests first: Define expected behavior
Run targeted tests: Use -i during development
Run full suite: Before committing changes
Use in CI/CD: Catch regressions automatically

# During development
baml test -i "NewFunction::" --parallel 1

# Before commit
baml test

# In CI
baml test --require-human-eval --parallel 5

baml generate - Generate client code before testing
baml dev - Development server for interactive testing
baml serve - HTTP API server for testing via REST

BAML Language

Type System

CLI

Client API

LLM Providers

Usage

Options

Test Filtering Patterns

Pattern Syntax

Filter Precedence

Examples

Basic Testing

Test Discovery

Pattern-Based Filtering

Parallel Execution

Environment Configuration

CI/CD Integration

Exit Codes

Test Definition

Assertions

Multiple Test Cases

Troubleshooting

No tests found

Test failures

Human evaluation required

API key errors

Rate limiting

Slow test execution

Cancelling test runs

Best Practices

Organizing Tests

Assertion Strategy

Development Workflow

Build docs developers (and LLMs) love

BAML Language

Type System

CLI

Client API

LLM Providers

​Usage

​Options

​Test Filtering Patterns

​Pattern Syntax

​Filter Precedence

​Examples

​Basic Testing

​Test Discovery

​Pattern-Based Filtering

​Parallel Execution

​Environment Configuration

​CI/CD Integration

​Exit Codes

​Test Definition

​Assertions

​Multiple Test Cases

​Troubleshooting

​No tests found

​Test failures

​Human evaluation required

​API key errors

​Rate limiting

​Slow test execution

​Cancelling test runs

​Best Practices

​Organizing Tests

​Assertion Strategy

​Development Workflow

​Related Commands

Build docs developers (and LLMs) love

Usage

Options

Test Filtering Patterns

Pattern Syntax

Filter Precedence

Examples

Basic Testing

Test Discovery

Pattern-Based Filtering

Parallel Execution

Environment Configuration

CI/CD Integration

Exit Codes

Test Definition

Assertions

Multiple Test Cases

Troubleshooting

No tests found

Test failures

Human evaluation required

API key errors

Rate limiting

Slow test execution

Cancelling test runs

Best Practices

Organizing Tests

Assertion Strategy

Development Workflow

Related Commands