Ollama

Overview

The Ollama provider enables running open-source LLMs locally with LlamaIndex.TS. Ollama supports models like Llama, Mistral, CodeLlama, and more.

Installation

npm install @llamaindex/ollama

Prerequisites

Install Ollama: https://ollama.ai
Pull a model:

ollama pull llama3.2

Basic Usage

import { Ollama } from "@llamaindex/ollama";

const llm = new Ollama({
  model: "llama3.2"
});

const response = await llm.chat({
  messages: [
    { role: "user", content: "What is LlamaIndex?" }
  ]
});

console.log(response.message.content);

Constructor Options

model

string

required

Model name (e.g., “llama3.2”, “mistral”, “codellama”)

config

Partial<Config>

Ollama client configuration

Show Config Options

host

string

default:"http://localhost:11434"

Ollama server URL

options

Partial<Options>

Model options

Show Options

temperature

number

Sampling temperature

top_p

number

Nucleus sampling

num_ctx

number

default:4096

Context window size

num_predict

number

Maximum tokens to generate

repeat_penalty

number

Penalty for repetition

Popular Models

Llama 3.2

const llm = new Ollama({ model: "llama3.2" });

Mistral

const llm = new Ollama({ model: "mistral" });

CodeLlama

const llm = new Ollama({ model: "codellama" });

Phi-3

const llm = new Ollama({ model: "phi3" });

Streaming

const stream = await llm.chat({
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.delta);
}

Function Calling

import { tool } from "@llamaindex/core/tools";
import { z } from "zod";

const calculatorTool = tool({
  name: "calculator",
  description: "Perform calculations",
  parameters: z.object({
    expression: z.string()
  }),
  execute: async ({ expression }) => {
    return eval(expression).toString();
  }
});

const llm = new Ollama({ model: "llama3.2" });

const response = await llm.chat({
  messages: [{ role: "user", content: "What is 42 * 17?" }],
  tools: [calculatorTool]
});

Structured Output

import { z } from "zod";

const schema = z.object({
  name: z.string(),
  age: z.number(),
  skills: z.array(z.string())
});

const result = await llm.exec({
  messages: [{ role: "user", content: "Extract: John, 30, Python, TypeScript" }],
  responseFormat: schema
});

console.log(result.object);

Completion API

const response = await llm.complete({
  prompt: "Once upon a time",
  stream: false
});

console.log(response.text);

Custom Ollama Server

const llm = new Ollama({
  model: "llama3.2",
  config: {
    host: "http://custom-server:11434"
  }
});

Model Options

const llm = new Ollama({
  model: "llama3.2",
  options: {
    temperature: 0.8,
    top_p: 0.95,
    num_ctx: 8192,
    num_predict: 512,
    repeat_penalty: 1.1
  }
});

Embeddings

import { OllamaEmbedding } from "@llamaindex/ollama";

const embedModel = new OllamaEmbedding({
  model: "llama3.2"
});

const embedding = await embedModel.getTextEmbedding(
  "LlamaIndex is a data framework"
);

With LlamaIndex

import { Settings, VectorStoreIndex, Document } from "llamaindex";
import { Ollama, OllamaEmbedding } from "@llamaindex/ollama";

Settings.llm = new Ollama({ model: "llama3.2" });
Settings.embedModel = new OllamaEmbedding({ model: "llama3.2" });

const documents = [
  new Document({ text: "Document content..." })
];

const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine();

const response = await queryEngine.query({
  query: "What is the document about?"
});

Available Models

Pull models with ollama pull <model>:

llama3.2: Latest Llama 3.2
llama3.1: Llama 3.1 (8B, 70B, 405B)
llama2: Llama 2 (7B, 13B, 70B)
mistral: Mistral 7B
mixtral: Mixtral 8x7B
codellama: Code-specialized Llama
phi3: Microsoft Phi-3
gemma: Google Gemma
qwen: Alibaba Qwen

See full list: https://ollama.ai/library

Model Variants

Models come in different sizes:

# Default (typically 7B-8B)
ollama pull llama3.2

# Specific size
ollama pull llama3.1:70b
ollama pull mistral:7b-instruct

# Quantized versions (smaller, faster)
ollama pull llama3.2:7b-q4_0

Performance Tips

Choose appropriate model size: Smaller models (7B) for faster inference, larger (70B+) for quality
Adjust context window: Reduce num_ctx for faster responses
Use quantized models: Q4_0, Q5_0 variants for reduced memory usage
GPU acceleration: Ollama automatically uses GPU if available
Keep Ollama updated: ollama pull <model> to update models

Troubleshooting

Ollama Not Running

# Start Ollama
ollama serve

Model Not Found

# Pull the model first
ollama pull llama3.2

Connection Error

// Check Ollama is running on correct port
const llm = new Ollama({
  model: "llama3.2",
  config: {
    host: "http://localhost:11434"  // Default
  }
});

Best Practices

Run locally for privacy: All processing happens on your machine
Choose model wisely: Balance quality vs speed/memory
Monitor resource usage: Larger models need more RAM/VRAM
Use streaming: Better UX for long responses
Cache models: Models stay in memory for faster subsequent runs

Core Package

Main Package

LLM Providers

Vector Stores

Workflow & Tools

Overview

Installation

Prerequisites

Basic Usage

Constructor Options

Popular Models

Llama 3.2

Mistral

CodeLlama

Phi-3

Streaming

Function Calling

Structured Output

Completion API

Custom Ollama Server

Model Options

Embeddings

With LlamaIndex

Available Models

Model Variants

Performance Tips

Troubleshooting

Ollama Not Running

Model Not Found

Connection Error

Best Practices

See Also

Build docs developers (and LLMs) love

Core Package

Main Package

LLM Providers

Vector Stores

Workflow & Tools

​Overview

​Installation

​Prerequisites

​Basic Usage

​Constructor Options

​Popular Models

​Llama 3.2

​Mistral

​CodeLlama

​Phi-3

​Streaming

​Function Calling

​Structured Output

​Completion API

​Custom Ollama Server

​Model Options

​Embeddings

​With LlamaIndex

​Available Models

​Model Variants

​Performance Tips

​Troubleshooting

​Ollama Not Running

​Model Not Found

​Connection Error

​Best Practices

​See Also

Build docs developers (and LLMs) love

Overview

Installation

Prerequisites

Basic Usage

Constructor Options

Popular Models

Llama 3.2

Mistral

CodeLlama

Phi-3

Streaming

Function Calling

Structured Output

Completion API

Custom Ollama Server

Model Options

Embeddings

With LlamaIndex

Available Models

Model Variants

Performance Tips

Troubleshooting

Ollama Not Running

Model Not Found

Connection Error

Best Practices

See Also