Ollama enables running large language models locally on your machine.
Overview
Ollama provides:
- Local model execution (no API costs)
- Privacy (data stays on your machine)
- Offline operation
- Fast inference on local hardware
Supported models:
- Llama 2/3
- Mistral
- Mixtral
- Phi
- Gemma
- And more
Prerequisites
Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Start Ollama Server
Default endpoint: http://localhost:11434
Pull a Model
# Llama 3 8B (recommended for most use cases)
ollama pull llama3
# Smaller models (faster, less capable)
ollama pull phi3
ollama pull gemma:2b
# Larger models (more capable, slower)
ollama pull llama3:70b
ollama pull mixtral:8x7b
List installed models:
Configuration
Config File
[agent]
provider = "ollama"
model = "llama3" # Model name from 'ollama list'
[providers.ollama]
base_url = "http://localhost:11434" # Ollama server URL
CLI Usage
zeroclaw agent --provider ollama --model llama3
Features
Ollama supports tool calling for compatible models:
[providers.ollama]
tool_calling = "native" # Use model's native function calling
Models with tool support:
llama3:70b
mixtral:8x7b
mistral
Smaller models may have limited tool calling capabilities.
Streaming
Real-time response streaming:
[providers.ollama]
stream = true
Custom Parameters
[providers.ollama]
temperature = 0.7
top_p = 0.9
top_k = 40
repeat_penalty = 1.1
Model Selection Guide
For General Use
# Best balance (8GB RAM minimum)
ollama pull llama3
# Faster, less capable (4GB RAM)
ollama pull phi3
For Coding
ollama pull codellama
ollama pull deepseek-coder
For Maximum Quality
# Requires 40GB+ RAM
ollama pull llama3:70b
ollama pull mixtral:8x7b
GPU Acceleration
Ollama automatically uses GPU if available (CUDA, Metal, ROCm).
Check GPU usage:
Context Window
Adjust context size:
[providers.ollama]
num_ctx = 4096 # Default: 2048
Larger contexts use more memory but allow longer conversations.
Batch Size
[providers.ollama]
num_batch = 512 # Default: 512
Ollama uses a simple JSON format:
{
"model": "llama3",
"messages": [
{"role": "user", "content": "Hello!"}
],
"stream": false,
"options": {
"temperature": 0.7,
"top_p": 0.9
}
}
Troubleshooting
'Connection refused' error
Solution:Start Ollama server:Verify it’s running:curl http://localhost:11434/api/tags
Solution:Pull the model first:ollama pull llama3
ollama list
Solutions:
- Use smaller model:
- Reduce context window:
[providers.ollama]
num_ctx = 2048
- Enable GPU acceleration (requires compatible hardware)
Solutions:
- Use smaller model
- Reduce context window
- Close other applications
- Reduce num_batch
Example: Complete Setup
# Install Ollama
brew install ollama
# Start server (in separate terminal)
ollama serve &
# Pull model
ollama pull llama3
# Configure ZeroClaw
zeroclaw config set agent.provider ollama
zeroclaw config set agent.model llama3
# Test
zeroclaw agent -m "Hello!"
Remote Ollama
Connect to Ollama running on another machine:
[providers.ollama]
base_url = "http://192.168.1.100:11434"
Docker Deployment
Run Ollama in Docker:
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull llama3