Ollama

This integration connects Ollama’s local models to LangChain.

Installation

First, install Ollama from ollama.com and pull a model:

ollama pull llama3.2

Then install the LangChain integration:

pip install -U langchain-ollama

Usage

from langchain_ollama import ChatOllama

model = ChatOllama(
    model="llama3.2",
    temperature=0.8,
    num_predict=256,
)

messages = [
    ("system", "You are a helpful assistant."),
    ("human", "What is the capital of France?"),
]

response = model.invoke(messages)
print(response.content)

Streaming

for chunk in model.stream(messages):
    print(chunk.content, end="")

API Reference

ChatOllama

model

str

required

Name of Ollama model to use (e.g., llama3.2, mistral, phi3).

temperature

float

default:"0.8"

Sampling temperature between 0.0 and 1.0. Higher values make output more random.

num_predict

int | None

default:"None"

Maximum number of tokens to generate.

reasoning

bool | None

default:"None"

Controls reasoning/thinking mode for supported models:

True: Enables reasoning mode. Reasoning is captured in additional_kwargs.reasoning_content
False: Disables reasoning mode
None: Uses model’s default behavior

base_url

str

default:"http://localhost:11434"

Base URL where Ollama is running.

top_k

int | None

default:"None"

Reduces probability of generating nonsense. Higher values give more diversity.

top_p

float | None

default:"None"

Works together with top_k. Higher values give more diversity.

num_ctx

int | None

default:"None"

Sets the size of the context window used to generate the next token.

repeat_penalty

float | None

default:"None"

Sets how strongly to penalize repetitions. Higher values make repetitions less likely.

validate_model_on_init

bool

default:"False"

Whether to validate that the model exists when initializing.

stop

list[str] | None

default:"None"

Stop sequences to end generation.

Supported Models

Ollama supports hundreds of models. Popular options include:

Llama 3.2: Fast, efficient model from Meta
Mistral: High-quality open model
Phi-3: Microsoft’s small language model
Gemma: Google’s open model
DeepSeek: Reasoning-capable models
Qwen: Alibaba’s multilingual models

Visit ollama.com/library for the full model catalog.

Features

Run models locally without API keys
Full privacy - no data sent to external servers
Tool calling (select models)
Vision capabilities (multimodal models)
Streaming
Async support
Custom model parameters
Reasoning mode for supported models

Ollama runs models locally on your machine. Performance depends on your hardware. GPU acceleration is recommended for larger models.

Overview

Chat Models

Embeddings

Vector Stores

Tools

Installation

Usage

Streaming

API Reference

ChatOllama

Supported Models

Features

Build docs developers (and LLMs) love

Overview

Chat Models

Embeddings

Vector Stores

Tools

​Installation

​Usage

​Streaming

​API Reference

​ChatOllama

​Supported Models

​Features

Build docs developers (and LLMs) love

Installation

Usage

Streaming

API Reference

ChatOllama

Supported Models

Features