Overview
ChatOllama provides integration with locally running Ollama models, enabling completely private and offline browser automation without sending data to external APIs.Basic Usage
Prerequisites
- Install Ollama: Download from ollama.com
- Pull a model:
ollama pull llama3.2 - Start Ollama: It runs automatically after installation
Configuration
Required Parameters
Ollama model name. Popular options:
llama3.2: Fast and capablellama3.2:70b: More powerfulqwen2.5-coder:32b: Great for web tasksmistral: Alternative optioncodellama: Coding focused
Client Parameters
Ollama server URL. Defaults to
http://localhost:11434.Request timeout in seconds.
Additional parameters for the Ollama client.
Ollama-specific options for model behavior.Common options:
temperature: Sampling temperaturenum_predict: Max tokens to generatetop_k: Top-K samplingtop_p: Top-P samplingrepeat_penalty: Repetition penalty
Advanced Usage
Custom Ollama Host
With Ollama Options
Structured Output
Custom Timeout for Large Models
Using Dictionary Options
Setup Guide
macOS
Linux
Windows
- Download installer from ollama.com
- Run installer
- Open terminal and run:
ollama pull llama3.2
Docker
Error Handling
Properties
provider
Returns the provider name:"ollama"
name
Returns the model name.Methods
get_client()
Returns anOllamaAsyncClient instance.
ainvoke()
Asynchronously invoke the model with messages.Parameters
- messages (
list[BaseMessage]): List of messages - output_format (
type[T] | None): Optional Pydantic model for structured output
Returns
ChatInvokeCompletion[T] | ChatInvokeCompletion[str] with:
completion: Response content (string or structured output)usage: CurrentlyNonefor Ollama (not tracked)
Ollama does not currently provide token usage information in responses.
Recommended Models
For Speed
- llama3.2 (8B): Fast, good quality
- qwen2.5-coder (7B): Great for web tasks
- mistral (7B): Balanced performance
For Quality
- llama3.2:70b: Best quality, slower
- qwen2.5-coder:32b: Excellent for browser automation
- mixtral:8x7b: High quality mixture of experts
For Resource-Constrained
- llama3.2:3b: Very fast on CPU
- phi3: Microsoft’s efficient model
- tinyllama: Minimal resource usage
Performance Tips
- GPU Acceleration: Ollama automatically uses GPU if available
- Model Size: Smaller models are faster but less capable
- num_predict: Limit output tokens for faster responses
- Preload Models: Models load faster after first use
Troubleshooting
Ollama Not Running
Model Not Found
Connection Refused
Slow Performance
Benefits of Ollama
- Privacy: All data stays on your machine
- No API Costs: Free to use
- Offline Capable: Works without internet
- Fast: Low latency on local hardware
- Customizable: Full control over models and parameters
Limitations
- No Usage Tracking: Token counts not available
- Hardware Dependent: Performance varies by hardware
- Model Quality: May not match GPT-4 or Claude for complex tasks
- Setup Required: Need to install and manage Ollama
Related
- ChatBrowserUse - Recommended for production
- ChatOpenAI
- Ollama Documentation