Installation
First, install Ollama from ollama.com and pull a model:Usage
Streaming
API Reference
ChatOllama
Name of Ollama model to use (e.g.,
llama3.2, mistral, phi3).Sampling temperature between 0.0 and 1.0. Higher values make output more random.
Maximum number of tokens to generate.
Controls reasoning/thinking mode for supported models:
True: Enables reasoning mode. Reasoning is captured inadditional_kwargs.reasoning_contentFalse: Disables reasoning modeNone: Uses model’s default behavior
Base URL where Ollama is running.
Reduces probability of generating nonsense. Higher values give more diversity.
Works together with top_k. Higher values give more diversity.
Sets the size of the context window used to generate the next token.
Sets how strongly to penalize repetitions. Higher values make repetitions less likely.
Whether to validate that the model exists when initializing.
Stop sequences to end generation.
Supported Models
Ollama supports hundreds of models. Popular options include:- Llama 3.2: Fast, efficient model from Meta
- Mistral: High-quality open model
- Phi-3: Microsoft’s small language model
- Gemma: Google’s open model
- DeepSeek: Reasoning-capable models
- Qwen: Alibaba’s multilingual models
Features
- Run models locally without API keys
- Full privacy - no data sent to external servers
- Tool calling (select models)
- Vision capabilities (multimodal models)
- Streaming
- Async support
- Custom model parameters
- Reasoning mode for supported models
Ollama runs models locally on your machine. Performance depends on your hardware. GPU acceleration is recommended for larger models.