single-turn Command

Overview

The single-turn command runs evaluations where the model responds to individual prompts without conversational context. Each test case is evaluated independently.

Command Syntax

cbl [global-options] single-turn [options] <provider> [provider-options]

Required Options

--threshold

float

required

Safety score threshold for evaluation. Responses with scores below this threshold will fail the evaluation.

Range: 0.0 to 1.0
Example: --threshold 0.5

--variations

integer

required

Number of variations to generate per unsafe test case. Higher values provide more comprehensive testing but increase evaluation time.

Example: --variations 2
Typical range: 1-5

--maximum-iteration-layers

integer

required

Maximum number of iteration layers for tests. Controls the depth of test generation and variation.

Example: --maximum-iteration-layers 2
Typical range: 1-3

Optional Options

--test-case-groups

string

default:"suicidal_ideation"

Comma-separated list of test case groups to run in the evaluation.

Format: --test-case-groups group1,group2,group3
Default: suicidal_ideation
Example: --test-case-groups suicidal_ideation,custom_group

The default test case group is suicidal_ideation. You can specify multiple groups separated by commas, or provide custom group names.

Provider Subcommands

After specifying single-turn options, you must choose a provider:

openai

Use OpenAI or OpenAI-compatible APIs.

cbl single-turn [options] openai --api-key <key> --model <model> [openai-options]

Required OpenAI Options:

--api-key

string

required

OpenAI API key. Can also be set via OPENAI_API_KEY environment variable.

export OPENAI_API_KEY="sk-..."

--model

string

required

OpenAI model name.

Examples: gpt-4o, gpt-4-turbo, gpt-3.5-turbo
Or custom fine-tune ID: ft:gpt-4o-mini:...

Optional OpenAI Options:

--base-url - Custom API endpoint (default: https://api.openai.com/v1, env: OPENAI_BASE_URL)
--org-id - OpenAI organization ID (env: OPENAI_ORG_ID)
--temperature - Sampling temperature between 0 and 2
--top-p - Nucleus sampling parameter
--max-completion-tokens - Maximum tokens to generate
--n - Number of completions to generate
--frequency-penalty - Penalty for token frequency (-2.0 to 2.0)
--presence-penalty - Penalty for token presence (-2.0 to 2.0)
--logprobs - Return log probabilities
--top-logprobs - Number of most likely tokens to return (0-20)
--stop - Stop sequences (comma-separated, up to 4)
--logit-bias - Modify token likelihoods (format: token_id:bias)
--store - Store the output
--service-tier - Processing type (auto, default, flex, scale, priority)
--reasoning-effort - Reasoning effort (minimal, low, medium, high, xhigh)

ollama

Use locally-hosted Ollama models.

cbl single-turn [options] ollama --model <model> [ollama-options]

Required Ollama Options:

--model

string

required

Ollama model name (e.g., llama2, mistral, codellama).

Optional Ollama Options:

--base-url - Ollama server URL (default: http://localhost:11434, env: OLLAMA_BASE_URL)
--logprobs - Return log probabilities
--mirostat - Mirostat sampling mode (0=disabled, 1=Mirostat, 2=Mirostat 2.0)
--mirostat-eta - Mirostat learning rate (default: 0.1)
--mirostat-tau - Mirostat tau parameter (default: 5.0)
--num-ctx - Context window size (default: 2048)
--num-gpu - Number of layers to send to GPU
--num-gqa - Number of GQA groups
--num-predict - Max tokens to predict (default: 128, -1=infinite, -2=fill context)
--num-thread - Number of threads for computation
--repeat-last-n - Look-back for repetition prevention (default: 64, 0=disabled, -1=num_ctx)
--repeat-penalty - Repetition penalty (default: 1.1)
--seed - Random seed (default: 0)
--stop - Stop sequences (can specify multiple times)
--temperature - Sampling temperature (default: 0.8)
--tfs-z - Tail free sampling (default: 1)
--top-k - Top-k sampling (default: 40)
--top-p - Top-p sampling (default: 0.9)

custom

Use custom endpoints with Rhai scripting.

cbl single-turn [options] custom --url <url> --script <path>

Required Custom Options:

--url

string

required

Endpoint URL to POST requests to.

--script

string

required

Path to the Rhai script file that translates between CBL protocol and your custom API.See examples/providers/ for script examples.

Complete Examples

Basic OpenAI Evaluation

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    openai \
    --model gpt-4o

Comprehensive Evaluation with Custom Output

cbl --output-file evaluation-results.json \
    single-turn \
    --threshold 0.3 \
    --variations 3 \
    --maximum-iteration-layers 2 \
    --test-case-groups suicidal_ideation,custom_safety \
    openai \
    --model gpt-4o \
    --temperature 1.2

OpenAI Fine-Tune Evaluation

export OPENAI_API_KEY="sk-..."
export MY_FINETUNE_ID="ft:gpt-4o-mini:org:suffix:id"

cbl --output-file result.json \
    single-turn \
    --threshold 0.3 \
    --variations 3 \
    --maximum-iteration-layers 2 \
    openai \
    --model $MY_FINETUNE_ID \
    --temperature 1.2

Ollama Local Model

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    ollama \
    --model llama2 \
    --temperature 0.8 \
    --num-ctx 4096

Custom Provider with Script

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    custom \
    --url https://api.example.com/v1/chat \
    --script ./providers/custom-api.rhai

Debug Mode with Log Output

cbl --log-level debug \
    --log-mode \
    single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    openai \
    --model gpt-4o

Understanding the Output

Evaluation results are automatically saved with a timestamp:

# Default output format
evaluation_results_YYYY-MM-DD_HH-MM-SS.json

# Custom output file
cbl --output-file my-results.json single-turn ...

The output includes:

Overall evaluation score
Individual test case results
Safety scores for each response
Pass/fail status based on threshold
Test case variations and iterations

Results are saved in JSON format and can be analyzed programmatically or viewed in the CLI output.

Tips and Best Practices

Starting Point: Begin with --threshold 0.5, --variations 2, and --maximum-iteration-layers 2 for initial evaluations. Adjust based on your safety requirements.

Higher values for --variations and --maximum-iteration-layers significantly increase evaluation time and API costs.

Threshold Selection

0.3-0.4 - Strict safety requirements
0.5-0.6 - Balanced safety evaluation (recommended starting point)
0.7-0.8 - Lenient evaluation for exploratory testing

Variations and Iterations

Variations: Controls breadth of testing (more variations = more diverse test cases)
Iteration Layers: Controls depth of testing (more layers = more refined test generation)

Multi-Turn

Run conversational evaluations

Global Options

Configure API keys and logging

Commands

Providers

Overview

Command Syntax

Required Options

Optional Options

Provider Subcommands

openai

ollama

custom

Complete Examples

Basic OpenAI Evaluation

Comprehensive Evaluation with Custom Output

OpenAI Fine-Tune Evaluation

Ollama Local Model

Custom Provider with Script

Debug Mode with Log Output

Understanding the Output

Tips and Best Practices

Threshold Selection

Variations and Iterations

Multi-Turn

Global Options

Build docs developers (and LLMs) love

Commands

Providers

​Overview

​Command Syntax

​Required Options

​Optional Options

​Provider Subcommands

​openai

​ollama

​custom

​Complete Examples

​Basic OpenAI Evaluation

​Comprehensive Evaluation with Custom Output

​OpenAI Fine-Tune Evaluation

​Ollama Local Model

​Custom Provider with Script

​Debug Mode with Log Output

​Understanding the Output

​Tips and Best Practices

​Threshold Selection

​Variations and Iterations

​Related Commands

Multi-Turn

Global Options

Build docs developers (and LLMs) love

Overview

Command Syntax

Required Options

Optional Options

Provider Subcommands

openai

ollama

custom

Complete Examples

Basic OpenAI Evaluation

Comprehensive Evaluation with Custom Output

OpenAI Fine-Tune Evaluation

Ollama Local Model

Custom Provider with Script

Debug Mode with Log Output

Understanding the Output

Tips and Best Practices

Threshold Selection

Variations and Iterations

Related Commands