Overview
The Ollama provider connects to locally-running Ollama instances to use open-source models like Qwen, Llama, and others. It supports the full OpenAI-compatible API with tool calling.
Source: crates/goose/src/providers/ollama.rs
Configuration
Environment Variables
OLLAMA_HOST
string
default:"localhost"
Ollama server host (automatically adds port 11434 for localhost)
Request timeout in seconds
Override context window size (sets num_ctx option)
Setup
# Install Ollama first
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull qwen3
# Start Ollama (usually runs automatically)
ollama serve
# Configure Goose
export OLLAMA_HOST="localhost" # default
# or for remote Ollama:
export OLLAMA_HOST="http://192.168.1.100:11434"
Supported Models
Recommended Models
qwen3 (default) - Qwen 3 base model
qwen3-vl - Qwen 3 with vision capabilities
qwen3-coder:30b - 30B parameter coding model
qwen3-coder:480b-cloud - Large cloud-based coding model
Other Popular Models
llama3.3 - Meta’s Llama 3.3
codellama - Code-specialized Llama
mistral - Mistral models
gemma - Google’s Gemma
phi - Microsoft’s Phi models
Model Library: https://ollama.com/library
Usage
Basic Usage
use goose::providers::create;
use goose::model::ModelConfig;
// Create with default model (qwen3)
let model_config = ModelConfig::new("qwen3")?;
let provider = create("ollama", model_config, vec![]).await?;
// Stream a response
let messages = vec![Message::user().with_text("Hello!")];
let stream = provider.stream(
&provider.get_model_config(),
"session-123",
"You are a helpful assistant.",
&messages,
&[],
).await?;
Custom Configuration
let model_config = ModelConfig::new("llama3.3")?
.with_temperature(0.8)
.with_max_tokens(4096)
.with_context_limit(Some(8192));
let provider = create("ollama", model_config, vec![]).await?;
Setting Context Window
Ollama allows configuring the context window size:
# Set via environment variable (applies to all requests)
export GOOSE_INPUT_LIMIT=16384
# Or via model config
let model_config = ModelConfig::new("qwen3")?
.with_context_limit(Some(16384));
This sets the num_ctx parameter in Ollama options.
Advanced Features
Ollama supports native tool calling for compatible models:
use rmcp::model::Tool;
let tools = vec![
Tool {
name: "calculator".into(),
description: Some("Perform calculations".into()),
input_schema: serde_json::json!({
"type": "object",
"properties": {
"expression": {"type": "string"},
},
"required": ["expression"],
}),
},
];
let stream = provider.stream(
&model_config,
"session-123",
"You are a helpful assistant.",
&messages,
&tools,
).await?;
For models without native tool support, Ollama automatically falls back to XML-based tool calls:
<tool_call>
<name>calculator</name>
<arguments>{"expression": "2+2"}</arguments>
</tool_call>
The provider parses these and converts them to proper tool calls.
Chat Mode
Disable tools in chat-only mode:
In chat mode, tools are filtered out before sending to Ollama.
Vision Models
Use vision-capable models for image understanding:
let model_config = ModelConfig::new("qwen3-vl")?;
let provider = create("ollama", model_config, vec![]).await?;
let messages = vec![
Message::user()
.with_image("https://example.com/image.jpg", "image/jpeg")
.with_text("What's in this image?"),
];
Remote Ollama
Connect to Ollama running on another machine:
# Explicit HTTP URL
export OLLAMA_HOST="http://192.168.1.100:11434"
# HTTPS with custom port
export OLLAMA_HOST="https://ollama.example.com:443"
# Domain without protocol (assumes HTTP)
export OLLAMA_HOST="ollama.local"
Port Handling
localhost defaults to port 11434
- Remote hosts without explicit port: no default port
- Explicit ports in URL are respected:
http://host:8080
Implementation Details
impl ProviderDef for OllamaProvider {
fn metadata() -> ProviderMetadata {
ProviderMetadata::new(
"ollama",
"Ollama",
"Local open source models",
"qwen3",
OLLAMA_KNOWN_MODELS.to_vec(),
"https://ollama.com/library",
vec![
ConfigKey::new("OLLAMA_HOST", true, false, Some("localhost"), true),
ConfigKey::new("OLLAMA_TIMEOUT", false, false, Some("600"), false),
],
)
}
}
Uses OpenAI-compatible chat completions format:
{
"model": "qwen3",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
],
"stream": true,
"options": {
"num_ctx": 8192
}
}
Endpoint: POST /v1/chat/completions
Context Window Configuration
fn resolve_ollama_num_ctx(model_config: &ModelConfig) -> Option<usize> {
let config = Config::global();
// Priority: GOOSE_INPUT_LIMIT > model_config.context_limit
config.get_param::<usize>("GOOSE_INPUT_LIMIT")
.ok()
.or(model_config.context_limit)
}
fn apply_ollama_options(payload: &mut Value, model_config: &ModelConfig) {
if let Some(limit) = resolve_ollama_num_ctx(model_config) {
payload["options"]["num_ctx"] = json!(limit);
}
}
No Authentication
Ollama doesn’t require authentication:
let api_client = ApiClient::with_timeout(
base_url,
AuthMethod::NoAuth,
timeout,
)?;
Streaming
Ollama supports standard SSE streaming with a custom parser that handles XML tool calls:
fn stream_ollama(response: Response, log: RequestLog) -> Result<MessageStream> {
let message_stream = response_to_streaming_message_ollama(framed);
// Custom parser that:
// 1. Buffers text when XML tool tags detected
// 2. Parses complete <tool_call> blocks
// 3. Converts to proper tool call messages
// 4. Prevents duplicate content emission
}
Fetching Available Models
// Get all locally pulled models
let models = provider.fetch_supported_models().await?;
// Queries: GET /api/tags
Example response:
{
"models": [
{
"name": "qwen3:latest",
"modified_at": "2024-03-04T12:00:00Z",
"size": 4661211648
},
{
"name": "llama3.3:latest",
"modified_at": "2024-03-03T10:00:00Z",
"size": 8661211648
}
]
}
Error Handling
match provider.stream(...).await {
Ok(stream) => { /* handle stream */ },
Err(ProviderError::RequestFailed(msg)) => {
if msg.contains("connection refused") {
eprintln!("Ollama not running. Start with: ollama serve");
} else {
eprintln!("Request failed: {}", msg);
}
},
Err(e) => eprintln!("Error: {}", e),
}
Custom Provider Configuration
Create from declarative config:
use goose::config::declarative_providers::DeclarativeProviderConfig;
use goose::providers::ollama::OllamaProvider;
let config = DeclarativeProviderConfig {
name: "my-ollama".to_string(),
engine: ProviderEngine::Ollama,
base_url: "http://192.168.1.100:11434".to_string(),
supports_streaming: Some(true),
// ... other fields
};
let provider = OllamaProvider::from_custom_config(model_config, config)?;
1. Adjust Context Window
Smaller context = faster inference:
export GOOSE_INPUT_LIMIT=4096 # Faster
# vs default 8192 or higher
2. Use Quantized Models
Smaller quantized models are faster:
ollama pull qwen3:q4_0 # 4-bit quantization
ollama pull qwen3:q8_0 # 8-bit (more accurate, slower)
3. GPU Acceleration
Ollama automatically uses GPU when available. Check with:
ollama list
# Shows loaded models and GPU usage
4. Keep Models Loaded
Models stay in memory after first use. For faster subsequent requests:
ollama run qwen3
# Keep this running in background
Troubleshooting
Ollama Not Running
# Check if running
curl http://localhost:11434/api/tags
# Start Ollama
ollama serve
Model Not Found
# List available models
ollama list
# Pull missing model
ollama pull qwen3
Out of Memory
Reduce context window or use smaller model:
export GOOSE_INPUT_LIMIT=2048
# or
ollama pull qwen3:7b # instead of qwen3:30b
See Also