Skip to main content
The Ollama provider allows you to run AI safety evaluations using models hosted locally with Ollama, providing privacy and cost-effectiveness for your testing workflow.

Prerequisites

Install Ollama

Ollama must be running locally before using this provider.
1

Download and Install

Download Ollama from ollama.ai and follow the installation instructions for your operating system:
  • macOS: Download and run the installer
  • Linux: Run curl -fsSL https://ollama.ai/install.sh | sh
  • Windows: Download the Windows installer
2

Start Ollama Service

After installation, start the Ollama service:
ollama serve
By default, Ollama runs on http://localhost:11434
3

Pull a Model

Download a model to use for evaluations:
ollama pull llama3.2
View available models at ollama.ai/library

Basic Usage

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    --maximum-iteration-layers 2 \
    ollama --model llama3.2

Configuration Options

Required Options

--model
string
required
Ollama model name to use for evaluations.Examples: llama3.2, mistral, codellama, gemma
The model must already be pulled via ollama pull <model-name>

Optional Options

--base-url
string
default:"http://localhost:11434"
Ollama server base URL. Change this if Ollama is running on a different host or port.Environment variable: OLLAMA_BASE_URLExample: --base-url http://192.168.1.100:11434
--logprobs
boolean
Return log probabilities for each token in the response.

Model Options

Ollama supports extensive model configuration through the following parameters:
--temperature
float
default:"0.8"
Model temperature - higher values make answers more creative.Range: 0.0 to 2.0
--top-k
integer
default:"40"
Reduces probability of generating nonsense. Higher values give more diverse answers.
--top-p
float
default:"0.9"
Works with top-k. Higher values lead to more diverse text.Range: 0.0 to 1.0
--num-predict
integer
default:"128"
Maximum number of tokens to predict.Special values:
  • -1: Infinite generation
  • -2: Fill context window
--num-ctx
integer
default:"2048"
Size of the context window (number of tokens).
--repeat-penalty
float
default:"1.1"
How strongly to penalize repetitions. Higher values reduce repetition.
--repeat-last-n
integer
default:"64"
How far back to look to prevent repetition.Special values:
  • 0: Disabled
  • -1: Use num_ctx value
--seed
integer
default:"0"
Random number seed for generation. Use the same seed for reproducible outputs.
--stop
string[]
Stop sequences - generation stops when these strings are encountered.Example: --stop END --stop STOP
--tfs-z
float
default:"1"
Tail free sampling - reduces impact of less probable tokens.
--mirostat
integer
default:"0"
Enable Mirostat sampling for controlling perplexity.Options:
  • 0: Disabled
  • 1: Mirostat 1.0
  • 2: Mirostat 2.0
--mirostat-tau
float
default:"5.0"
Mirostat tau - controls balance between coherence and diversity.
--mirostat-eta
float
default:"0.1"
Mirostat learning rate.

Hardware Options

--num-gpu
integer
Number of layers to send to GPU(s). Use to control GPU memory usage.
--num-thread
integer
Number of threads to use during computation. Adjust based on your CPU cores.
--num-gqa
integer
Number of GQA (Grouped Query Attention) groups in transformer layer. Model-specific setting.

Examples

Basic Single-Turn Evaluation

cbl single-turn \
    --threshold 0.5 \
    --variations 2 \
    ollama --model llama3.2

Multi-Turn with Custom Temperature

cbl multi-turn \
    --threshold 0.5 \
    --max-turns 8 \
    --test-types user_persona,semantic_chunks \
    ollama \
    --model mistral \
    --temperature 0.7

Remote Ollama Instance

cbl single-turn \
    --threshold 0.5 \
    ollama \
    --model codellama \
    --base-url http://192.168.1.100:11434

Reproducible Results with Seed

cbl single-turn \
    --threshold 0.5 \
    ollama \
    --model llama3.2 \
    --temperature 0.3 \
    --seed 42

Large Context Window Configuration

cbl multi-turn \
    --threshold 0.4 \
    --max-turns 10 \
    ollama \
    --model llama3.2 \
    --num-ctx 8192 \
    --num-predict 1024

GPU Optimization

cbl single-turn \
    --threshold 0.5 \
    ollama \
    --model llama3.2 \
    --num-gpu 35 \
    --num-thread 8

Advanced Sampling Configuration

cbl multi-turn \
    --threshold 0.5 \
    --max-turns 8 \
    ollama \
    --model mistral \
    --temperature 0.8 \
    --top-k 50 \
    --top-p 0.95 \
    --repeat-penalty 1.2 \
    --mirostat 2 \
    --mirostat-tau 5.0
Here are some popular models available through Ollama:
ModelSizeDescriptionPull Command
llama3.23BLatest Llama model, efficient and capableollama pull llama3.2
llama3.18B-70BPrevious Llama generation, multiple sizesollama pull llama3.1
mistral7BHigh-performance open modelollama pull mistral
mixtral8x7BMixture of experts modelollama pull mixtral
codellama7B-34BCode-specialized Llama variantollama pull codellama
gemma2B-7BGoogle’s efficient open modelollama pull gemma
phi2.7BMicrosoft’s compact modelollama pull phi
For a complete list of available models, visit the Ollama Library.

Environment Variables

VariableDescriptionRequired
OLLAMA_BASE_URLOllama server URLNo (defaults to http://localhost:11434)

Tips

Model Selection: Larger models (70B+) provide better quality but require more resources. Start with 7B-13B models for development, then scale up if needed.
Context Window: If you encounter truncation issues, increase --num-ctx. Be aware this increases memory usage.
GPU Memory: Monitor GPU memory usage when running large models. Use --num-gpu to control how many layers are offloaded to the GPU.
Reproducibility: For consistent results across runs, set both --seed and --temperature 0 to minimize randomness.

Troubleshooting

Connection Issues

If you see connection errors:
  1. Verify Ollama is running: ollama list
  2. Check the service is accessible: curl http://localhost:11434
  3. Ensure the model is pulled: ollama pull <model-name>

Performance Issues

  • Use --num-thread to match your CPU cores
  • Adjust --num-gpu to optimize GPU usage
  • Consider using smaller models for faster evaluations

Build docs developers (and LLMs) love