Ollama Provider

Ollama enables running large language models locally on your machine.

Overview

Ollama provides:

Local model execution (no API costs)
Privacy (data stays on your machine)
Offline operation
Fast inference on local hardware

Supported models:

Llama 2/3
Mistral
Mixtral
Phi
Gemma
And more

Prerequisites

Install Ollama

macOS
Linux
Windows

brew install ollama

curl -fsSL https://ollama.com/install.sh | sh

Start Ollama Server

ollama serve

Default endpoint: http://localhost:11434

Pull a Model

# Llama 3 8B (recommended for most use cases)
ollama pull llama3

# Smaller models (faster, less capable)
ollama pull phi3
ollama pull gemma:2b

# Larger models (more capable, slower)
ollama pull llama3:70b
ollama pull mixtral:8x7b

List installed models:

ollama list

Configuration

Config File

[agent]
provider = "ollama"
model = "llama3"  # Model name from 'ollama list'

[providers.ollama]
base_url = "http://localhost:11434"  # Ollama server URL

CLI Usage

zeroclaw agent --provider ollama --model llama3

Features

Tool Calling

Ollama supports tool calling for compatible models:

[providers.ollama]
tool_calling = "native"  # Use model's native function calling

Models with tool support:

llama3:70b
mixtral:8x7b
mistral

Smaller models may have limited tool calling capabilities.

Streaming

Real-time response streaming:

[providers.ollama]
stream = true

Custom Parameters

[providers.ollama]
temperature = 0.7
top_p = 0.9
top_k = 40
repeat_penalty = 1.1

Model Selection Guide

For General Use

# Best balance (8GB RAM minimum)
ollama pull llama3

# Faster, less capable (4GB RAM)
ollama pull phi3

For Coding

ollama pull codellama
ollama pull deepseek-coder

For Maximum Quality

# Requires 40GB+ RAM
ollama pull llama3:70b
ollama pull mixtral:8x7b

Performance Tuning

GPU Acceleration

Ollama automatically uses GPU if available (CUDA, Metal, ROCm). Check GPU usage:

ollama ps

Context Window

Adjust context size:

[providers.ollama]
num_ctx = 4096  # Default: 2048

Larger contexts use more memory but allow longer conversations.

Batch Size

[providers.ollama]
num_batch = 512  # Default: 512

Request Format

Ollama uses a simple JSON format:

{
  "model": "llama3",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ],
  "stream": false,
  "options": {
    "temperature": 0.7,
    "top_p": 0.9
  }
}

Troubleshooting

'Connection refused' error

Solution:Start Ollama server:

ollama serve

Verify it’s running:

curl http://localhost:11434/api/tags

'Model not found' error

Solution:Pull the model first:

ollama pull llama3
ollama list

Slow inference

Solutions:

Use smaller model:

ollama pull phi3

Reduce context window:

[providers.ollama]
num_ctx = 2048

Enable GPU acceleration (requires compatible hardware)

Out of memory

Solutions:

Use smaller model
Reduce context window
Close other applications
Reduce num_batch

Example: Complete Setup

# Install Ollama
brew install ollama

# Start server (in separate terminal)
ollama serve &

# Pull model
ollama pull llama3

# Configure ZeroClaw
zeroclaw config set agent.provider ollama
zeroclaw config set agent.model llama3

# Test
zeroclaw agent -m "Hello!"

Remote Ollama

Connect to Ollama running on another machine:

[providers.ollama]
base_url = "http://192.168.1.100:11434"

Docker Deployment

Run Ollama in Docker:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull llama3

Core Traits

Built-in Providers

Built-in Channels

Built-in Tools

CLI Commands

Overview

Prerequisites

Install Ollama

Start Ollama Server

Pull a Model

Configuration

Config File

CLI Usage

Features

Tool Calling

Streaming

Custom Parameters

Model Selection Guide

For General Use

For Coding

For Maximum Quality

Performance Tuning

GPU Acceleration

Context Window

Batch Size

Request Format

Troubleshooting

Example: Complete Setup

Remote Ollama

Docker Deployment

Build docs developers (and LLMs) love

Core Traits

Built-in Providers

Built-in Channels

Built-in Tools

CLI Commands

​Overview

​Prerequisites

​Install Ollama

​Start Ollama Server

​Pull a Model

​Configuration

​Config File

​CLI Usage

​Features

​Tool Calling

​Streaming

​Custom Parameters

​Model Selection Guide

​For General Use

​For Coding

​For Maximum Quality

​Performance Tuning

​GPU Acceleration

​Context Window

​Batch Size

​Request Format

​Troubleshooting

​Example: Complete Setup

​Remote Ollama

​Docker Deployment

​Related

Build docs developers (and LLMs) love

Overview

Prerequisites

Install Ollama

Start Ollama Server

Pull a Model

Configuration

Config File

CLI Usage

Features

Tool Calling

Streaming

Custom Parameters

Model Selection Guide

For General Use

For Coding

For Maximum Quality

Performance Tuning

GPU Acceleration

Context Window

Batch Size

Request Format

Troubleshooting

Example: Complete Setup

Remote Ollama

Docker Deployment

Related