Prerequisites
You must have Ollama installed and running before using this provider.
Install Ollama
Download and install from ollama.ai
Quick Start
Configuration
Basic Configuration
Model Selection
Ollama supports many models. Some recommended for coding:Environment Setup
Enabling the Provider
By default, Ollama is disabled. You must provide anis_env_set implementation:
Custom Endpoint
For remote Ollama instances:Model Parameters
Ollama uses theoptions object for model parameters:
Parameter Details
| Parameter | Type | Default | Description |
|---|---|---|---|
temperature | number | 0.75 | Controls randomness (0.0-1.0) |
num_ctx | number | 2048 | Context window size in tokens |
top_p | number | 0.9 | Nucleus sampling threshold |
top_k | number | 40 | Top-k sampling parameter |
repeat_penalty | number | 1.1 | Penalty for repetition |
keep_alive | string | ”5m” | How long to keep model loaded |
List Available Models
List all models installed in Ollama:Model Management
Pull Models
Remove Models
Check Model Info
ReAct Prompting
Ollama uses ReAct-style prompting by default for tool use:Advanced Configuration
Keep-Alive Settings
Control how long models stay in memory:Context Window Optimization
Adjust based on your hardware:Authentication
For secured Ollama instances:Troubleshooting
Ollama Not Running
Ollama Not Running
If you see connection errors:
-
Check if Ollama is running:
-
Start Ollama:
- Verify endpoint in config matches Ollama’s address
Model Not Found
Model Not Found
Error: “model ‘model-name’ not found”
-
List installed models:
-
Pull the model:
Out of Memory
Out of Memory
If Ollama crashes or runs out of memory:
- Use a smaller model (e.g.,
qwen2.5-coder:7binstead of:14b) - Reduce
num_ctx - Close other applications
- Consider upgrading RAM/VRAM
Slow Responses
Slow Responses
If responses are too slow:
- Use GPU acceleration (should be automatic)
- Try a smaller model
- Reduce
num_ctx - Ensure no other heavy processes are running
Performance Tips
Model Size
- Larger ≠ always better
- 7B models: Fast, good for simple tasks
- 14B models: Balanced performance
- 34B+ models: Best quality, slower
Context Window
- Larger context uses more memory
- Start with 8192-16384
- Increase only if needed
- Monitor memory usage
Hardware
- GPU: Much faster than CPU
- RAM: 16GB+ recommended
- VRAM: 8GB+ for larger models
- SSD: Faster model loading
Keep-Alive
- Longer = faster responses
- Shorter = less memory usage
- Balance based on usage pattern