Prerequisites
Install Ollama
Ollama must be running locally before using this provider.Download and Install
Download Ollama from ollama.ai and follow the installation instructions for your operating system:
- macOS: Download and run the installer
- Linux: Run
curl -fsSL https://ollama.ai/install.sh | sh - Windows: Download the Windows installer
Start Ollama Service
After installation, start the Ollama service:By default, Ollama runs on
http://localhost:11434Pull a Model
Basic Usage
Configuration Options
Required Options
Ollama model name to use for evaluations.Examples:
llama3.2, mistral, codellama, gemmaThe model must already be pulled via
ollama pull <model-name>Optional Options
Ollama server base URL. Change this if Ollama is running on a different host or port.Environment variable:
OLLAMA_BASE_URLExample: --base-url http://192.168.1.100:11434Return log probabilities for each token in the response.
Model Options
Ollama supports extensive model configuration through the following parameters:Model temperature - higher values make answers more creative.Range: 0.0 to 2.0
Reduces probability of generating nonsense. Higher values give more diverse answers.
Works with top-k. Higher values lead to more diverse text.Range: 0.0 to 1.0
Maximum number of tokens to predict.Special values:
-1: Infinite generation-2: Fill context window
Size of the context window (number of tokens).
How strongly to penalize repetitions. Higher values reduce repetition.
How far back to look to prevent repetition.Special values:
0: Disabled-1: Usenum_ctxvalue
Random number seed for generation. Use the same seed for reproducible outputs.
Stop sequences - generation stops when these strings are encountered.Example:
--stop END --stop STOPTail free sampling - reduces impact of less probable tokens.
Enable Mirostat sampling for controlling perplexity.Options:
0: Disabled1: Mirostat 1.02: Mirostat 2.0
Mirostat tau - controls balance between coherence and diversity.
Mirostat learning rate.
Hardware Options
Number of layers to send to GPU(s). Use to control GPU memory usage.
Number of threads to use during computation. Adjust based on your CPU cores.
Number of GQA (Grouped Query Attention) groups in transformer layer. Model-specific setting.
Examples
Basic Single-Turn Evaluation
Multi-Turn with Custom Temperature
Remote Ollama Instance
Reproducible Results with Seed
Large Context Window Configuration
GPU Optimization
Advanced Sampling Configuration
Popular Models
Here are some popular models available through Ollama:| Model | Size | Description | Pull Command |
|---|---|---|---|
| llama3.2 | 3B | Latest Llama model, efficient and capable | ollama pull llama3.2 |
| llama3.1 | 8B-70B | Previous Llama generation, multiple sizes | ollama pull llama3.1 |
| mistral | 7B | High-performance open model | ollama pull mistral |
| mixtral | 8x7B | Mixture of experts model | ollama pull mixtral |
| codellama | 7B-34B | Code-specialized Llama variant | ollama pull codellama |
| gemma | 2B-7B | Google’s efficient open model | ollama pull gemma |
| phi | 2.7B | Microsoft’s compact model | ollama pull phi |
For a complete list of available models, visit the Ollama Library.
Environment Variables
| Variable | Description | Required |
|---|---|---|
OLLAMA_BASE_URL | Ollama server URL | No (defaults to http://localhost:11434) |
Tips
Troubleshooting
Connection Issues
If you see connection errors:- Verify Ollama is running:
ollama list - Check the service is accessible:
curl http://localhost:11434 - Ensure the model is pulled:
ollama pull <model-name>
Performance Issues
- Use
--num-threadto match your CPU cores - Adjust
--num-gputo optimize GPU usage - Consider using smaller models for faster evaluations