Skip to main content
The Circuit Breaker Labs CLI supports two types of AI safety evaluations, each designed to test different aspects of model behavior and safety alignment.

Single-Turn vs Multi-Turn

Single-turn evaluations test how your model responds to individual, isolated prompts. Each test case is independent, and the model has no conversation history to consider.Best for:
  • Testing direct prompt injection vulnerabilities
  • Evaluating immediate safety responses
  • Quick safety assessments
  • Testing specific harmful content detection

Test Case Groups

Both evaluation types support test case groups that categorize the types of unsafe content being tested:
# Tests suicidal ideation prompts (default)
cbl single-turn \
    --test-case-groups suicidal_ideation \
    openai --model gpt-4o

Available Test Case Groups

Tests the model’s ability to handle sensitive mental health topics safely. This group includes prompts that attempt to elicit harmful advice or normalize self-harm.Example prompts tested:
  • Direct requests for harmful information
  • Indirect manipulation attempts
  • Context-shifting scenarios
You can specify custom test case groups that you’ve created in your Circuit Breaker Labs account. Custom groups allow you to test domain-specific safety requirements.Use cases:
  • Industry-specific compliance testing
  • Brand safety alignment
  • Custom policy enforcement
  • Domain-specific harmful content

Safety Thresholds

Both evaluation types use a threshold parameter to determine pass/fail criteria:
The threshold is a float value between 0.0 and 1.0, representing the minimum safety score required for a response to pass. Responses scoring below this threshold are flagged as unsafe.
# Strict threshold (0.8) - fewer responses will pass
cbl single-turn --threshold 0.8 openai --model gpt-4o

# Moderate threshold (0.5) - balanced approach
cbl single-turn --threshold 0.5 openai --model gpt-4o

# Lenient threshold (0.3) - most responses will pass
cbl single-turn --threshold 0.3 openai --model gpt-4o

Choosing the Right Threshold

1

Understand Your Use Case

High-risk applications (healthcare, mental health support, child-facing products) should use stricter thresholds (0.7-0.9).
2

Baseline Your Model

Run evaluations with moderate thresholds (0.5) first to understand your model’s current safety performance.
3

Iterate and Refine

Adjust thresholds based on your risk tolerance and the false positive/negative trade-offs you observe in results.

Comparison Table

FeatureSingle-TurnMulti-Turn
Test DurationFast (seconds to minutes)Slower (minutes to hours)
Conversation HistoryNoneFull context maintained
Attack ComplexitySimple, direct promptsSophisticated, multi-step manipulation
Parametersthreshold, variations, maximum_iteration_layersthreshold, max_turns, test_types
Best ForQuick safety checks, direct vulnerabilitiesRealistic attack simulation, jailbreak testing
Resource UsageLowHigher (more API calls)

Quick Start Examples

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    openai --model gpt-4o
Always set the CBL_API_KEY and provider-specific API keys (e.g., OPENAI_API_KEY) before running evaluations:
export CBL_API_KEY="your_cbl_api_key"
export OPENAI_API_KEY="your_openai_api_key"

Next Steps

Single-Turn Evaluations

Deep dive into single-turn evaluation parameters and usage

Multi-Turn Evaluations

Learn about conversational safety testing

Providers

Configure OpenAI, Ollama, or custom model providers

Custom Providers

Create custom providers with Rhai scripting

Build docs developers (and LLMs) love