Overview
The Ollama provider enables running open-source LLMs locally with LlamaIndex.TS. Ollama supports models like Llama, Mistral, CodeLlama, and more.
Installation
npm install @llamaindex/ollama
Prerequisites
Install Ollama: https://ollama.ai
Pull a model:
Basic Usage
import { Ollama } from "@llamaindex/ollama" ;
const llm = new Ollama ({
model: "llama3.2"
});
const response = await llm . chat ({
messages: [
{ role: "user" , content: "What is LlamaIndex?" }
]
});
console . log ( response . message . content );
Constructor Options
Model name (e.g., “llama3.2”, “mistral”, “codellama”)
Ollama client configuration host
string
default: "http://localhost:11434"
Ollama server URL
Model options Maximum tokens to generate
Popular Models
Llama 3.2
const llm = new Ollama ({ model: "llama3.2" });
Mistral
const llm = new Ollama ({ model: "mistral" });
CodeLlama
const llm = new Ollama ({ model: "codellama" });
Phi-3
const llm = new Ollama ({ model: "phi3" });
Streaming
const stream = await llm . chat ({
messages: [{ role: "user" , content: "Tell me a story" }],
stream: true
});
for await ( const chunk of stream ) {
process . stdout . write ( chunk . delta );
}
Function Calling
import { tool } from "@llamaindex/core/tools" ;
import { z } from "zod" ;
const calculatorTool = tool ({
name: "calculator" ,
description: "Perform calculations" ,
parameters: z . object ({
expression: z . string ()
}),
execute : async ({ expression }) => {
return eval ( expression ). toString ();
}
});
const llm = new Ollama ({ model: "llama3.2" });
const response = await llm . chat ({
messages: [{ role: "user" , content: "What is 42 * 17?" }],
tools: [ calculatorTool ]
});
Structured Output
import { z } from "zod" ;
const schema = z . object ({
name: z . string (),
age: z . number (),
skills: z . array ( z . string ())
});
const result = await llm . exec ({
messages: [{ role: "user" , content: "Extract: John, 30, Python, TypeScript" }],
responseFormat: schema
});
console . log ( result . object );
Completion API
const response = await llm . complete ({
prompt: "Once upon a time" ,
stream: false
});
console . log ( response . text );
Custom Ollama Server
const llm = new Ollama ({
model: "llama3.2" ,
config: {
host: "http://custom-server:11434"
}
});
Model Options
const llm = new Ollama ({
model: "llama3.2" ,
options: {
temperature: 0.8 ,
top_p: 0.95 ,
num_ctx: 8192 ,
num_predict: 512 ,
repeat_penalty: 1.1
}
});
Embeddings
import { OllamaEmbedding } from "@llamaindex/ollama" ;
const embedModel = new OllamaEmbedding ({
model: "llama3.2"
});
const embedding = await embedModel . getTextEmbedding (
"LlamaIndex is a data framework"
);
With LlamaIndex
import { Settings , VectorStoreIndex , Document } from "llamaindex" ;
import { Ollama , OllamaEmbedding } from "@llamaindex/ollama" ;
Settings . llm = new Ollama ({ model: "llama3.2" });
Settings . embedModel = new OllamaEmbedding ({ model: "llama3.2" });
const documents = [
new Document ({ text: "Document content..." })
];
const index = await VectorStoreIndex . fromDocuments ( documents );
const queryEngine = index . asQueryEngine ();
const response = await queryEngine . query ({
query: "What is the document about?"
});
Available Models
Pull models with ollama pull <model>:
llama3.2 : Latest Llama 3.2
llama3.1 : Llama 3.1 (8B, 70B, 405B)
llama2 : Llama 2 (7B, 13B, 70B)
mistral : Mistral 7B
mixtral : Mixtral 8x7B
codellama : Code-specialized Llama
phi3 : Microsoft Phi-3
gemma : Google Gemma
qwen : Alibaba Qwen
See full list: https://ollama.ai/library
Model Variants
Models come in different sizes:
# Default (typically 7B-8B)
ollama pull llama3.2
# Specific size
ollama pull llama3.1:70b
ollama pull mistral:7b-instruct
# Quantized versions (smaller, faster)
ollama pull llama3.2:7b-q4_0
Choose appropriate model size : Smaller models (7B) for faster inference, larger (70B+) for quality
Adjust context window : Reduce num_ctx for faster responses
Use quantized models : Q4_0, Q5_0 variants for reduced memory usage
GPU acceleration : Ollama automatically uses GPU if available
Keep Ollama updated : ollama pull <model> to update models
Troubleshooting
Ollama Not Running
# Start Ollama
ollama serve
Model Not Found
# Pull the model first
ollama pull llama3.2
Connection Error
// Check Ollama is running on correct port
const llm = new Ollama ({
model: "llama3.2" ,
config: {
host: "http://localhost:11434" // Default
}
});
Best Practices
Run locally for privacy : All processing happens on your machine
Choose model wisely : Balance quality vs speed/memory
Monitor resource usage : Larger models need more RAM/VRAM
Use streaming : Better UX for long responses
Cache models : Models stay in memory for faster subsequent runs
See Also