Ollama Provider

Ollama allows you to run large language models locally on your machine, providing privacy, offline capability, and no API costs. Avante.nvim has full support for Ollama.

Prerequisites

You must have Ollama installed and running before using this provider.

Install Ollama

Download and install from ollama.ai

# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from the website

Start Ollama service

ollama serve

By default, Ollama runs on http://127.0.0.1:11434

Pull a model

ollama pull qwen2.5-coder:14b
# or
ollama pull deepseek-coder-v2
# or
ollama pull codellama

Quick Start

{
  "yetone/avante.nvim",
  opts = {
    provider = "ollama",
    providers = {
      ollama = {
        model = "qwen2.5-coder:14b",
        -- Enable the provider by checking endpoint
        is_env_set = require("avante.providers.ollama").check_endpoint_alive,
      },
    },
  },
}

Configuration

Basic Configuration

providers = {
  ollama = {
    endpoint = "http://127.0.0.1:11434",
    model = "qwen2.5-coder:14b",
    timeout = 30000,
    extra_request_body = {
      options = {
        temperature = 0.75,
        num_ctx = 20480,
        keep_alive = "5m",
      },
    },
  },
}

Model Selection

Ollama supports many models. Some recommended for coding:

providers = {
  ollama = {
    model = "qwen2.5-coder:14b",
    extra_request_body = {
      options = {
        num_ctx = 32768,
      },
    },
  },
}

Environment Setup

Enabling the Provider

By default, Ollama is disabled. You must provide an is_env_set implementation:

providers = {
  ollama = {
    -- Check if endpoint is alive
    is_env_set = require("avante.providers.ollama").check_endpoint_alive,
  },
}

Or use a custom check:

providers = {
  ollama = {
    is_env_set = function()
      -- Your custom logic here
      return true
    end,
  },
}

Custom Endpoint

For remote Ollama instances:

providers = {
  ollama = {
    endpoint = "http://192.168.1.100:11434",
  },
}

Model Parameters

Ollama uses the options object for model parameters:

extra_request_body = {
  options = {
    temperature = 0.75,     -- Randomness (0.0-1.0)
    num_ctx = 20480,        -- Context window size
    top_p = 0.9,            -- Nucleus sampling
    top_k = 40,             -- Top-k sampling
    repeat_penalty = 1.1,   -- Repetition penalty
    keep_alive = "5m",      -- Model keep-alive duration
  },
}

Parameter Details

Parameter	Type	Default	Description
`temperature`	number	0.75	Controls randomness (0.0-1.0)
`num_ctx`	number	2048	Context window size in tokens
`top_p`	number	0.9	Nucleus sampling threshold
`top_k`	number	40	Top-k sampling parameter
`repeat_penalty`	number	1.1	Penalty for repetition
`keep_alive`	string	”5m”	How long to keep model loaded

List Available Models

List all models installed in Ollama:

local models = require('avante.providers').ollama:list_models()
for _, model in ipairs(models) do
  print(model.display_name)
end

Or via command line:

ollama list

Model Management

Pull Models

# Pull a specific model
ollama pull qwen2.5-coder:14b

# Pull latest version
ollama pull qwen2.5-coder:latest

# Pull a specific size variant
ollama pull llama3:70b

Remove Models

ollama rm model-name

Check Model Info

ollama show model-name

ReAct Prompting

Ollama uses ReAct-style prompting by default for tool use:

providers = {
  ollama = {
    use_ReAct_prompt = true, -- Default
  },
}

This enables better tool calling through XML-based prompting.

Advanced Configuration

Keep-Alive Settings

Control how long models stay in memory:

extra_request_body = {
  options = {
    keep_alive = "10m", -- Keep loaded for 10 minutes
    -- keep_alive = "0",   -- Unload immediately
    -- keep_alive = "-1",  -- Keep loaded indefinitely
  },
}

Context Window Optimization

Adjust based on your hardware:

extra_request_body = {
  options = {
    -- For 8GB+ VRAM
    num_ctx = 32768,
    
    -- For 4-8GB VRAM
    -- num_ctx = 16384,
    
    -- For <4GB VRAM
    -- num_ctx = 8192,
  },
}

Authentication

For secured Ollama instances:

providers = {
  ollama = {
    api_key_name = "OLLAMA_API_KEY", -- Optional
  },
}

Troubleshooting

Ollama Not Running

If you see connection errors:

Check if Ollama is running:
```
curl http://127.0.0.1:11434/api/tags
```
Start Ollama:
```
ollama serve
```
Verify endpoint in config matches Ollama’s address

Model Not Found

Error: “model ‘model-name’ not found”

List installed models:
```
ollama list
```
Pull the model:
```
ollama pull model-name
```

Out of Memory

If Ollama crashes or runs out of memory:

Use a smaller model (e.g., qwen2.5-coder:7b instead of :14b)
Reduce num_ctx
Close other applications
Consider upgrading RAM/VRAM

Slow Responses

If responses are too slow:

Use GPU acceleration (should be automatic)
Try a smaller model
Reduce num_ctx
Ensure no other heavy processes are running

Performance Tips

Model Size

Larger ≠ always better
7B models: Fast, good for simple tasks
14B models: Balanced performance
34B+ models: Best quality, slower

Context Window

Larger context uses more memory
Start with 8192-16384
Increase only if needed
Monitor memory usage

Hardware

GPU: Much faster than CPU
RAM: 16GB+ recommended
VRAM: 8GB+ for larger models
SSD: Faster model loading

Keep-Alive

Longer = faster responses
Shorter = less memory usage
Balance based on usage pattern

Best Practices

Model Selection

For different use cases:

-- Quick code completion
model = "qwen2.5-coder:7b"

-- Balanced coding assistance
model = "qwen2.5-coder:14b"

-- Complex refactoring
model = "deepseek-coder-v2:16b"

-- General purpose
model = "llama3:8b"

Resource Management

providers = {
  ollama = {
    timeout = 60000, -- Longer timeout for local models
    extra_request_body = {
      options = {
        keep_alive = "5m", -- Good default
        num_ctx = 16384,   -- Adjust to your RAM
      },
    },
  },
}

Example Configurations

{
  provider = "ollama",
  providers = {
    ollama = {
      endpoint = "http://127.0.0.1:11434",
      model = "qwen2.5-coder:14b",
      is_env_set = require("avante.providers.ollama").check_endpoint_alive,
      timeout = 30000,
      extra_request_body = {
        options = {
          temperature = 0.7,
          num_ctx = 20480,
          keep_alive = "5m",
        },
      },
    },
  },
}

Get Started

Core Concepts

Configuration

Features

Providers

Advanced

Guides

Prerequisites

Quick Start

Configuration

Basic Configuration

Model Selection

Environment Setup

Enabling the Provider

Custom Endpoint

Model Parameters

Parameter Details

List Available Models

Model Management

Pull Models

Remove Models

Check Model Info

ReAct Prompting

Advanced Configuration

Keep-Alive Settings

Context Window Optimization

Authentication

Troubleshooting

Performance Tips

Model Size

Context Window

Hardware

Keep-Alive

Best Practices

Model Selection

Resource Management

Example Configurations

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Features

Providers

Advanced

Guides

​Prerequisites

​Quick Start

​Configuration

​Basic Configuration

​Model Selection

​Environment Setup

​Enabling the Provider

​Custom Endpoint

​Model Parameters

​Parameter Details

​List Available Models

​Model Management

​Pull Models

​Remove Models

​Check Model Info

​ReAct Prompting

​Advanced Configuration

​Keep-Alive Settings

​Context Window Optimization

​Authentication

​Troubleshooting

​Performance Tips

Model Size

Context Window

Hardware

Keep-Alive

​Best Practices

​Model Selection

​Resource Management

​Example Configurations

​Related Resources

Build docs developers (and LLMs) love

Prerequisites

Quick Start

Configuration

Basic Configuration

Model Selection

Environment Setup

Enabling the Provider

Custom Endpoint

Model Parameters

Parameter Details

List Available Models

Model Management

Pull Models

Remove Models

Check Model Info

ReAct Prompting

Advanced Configuration

Keep-Alive Settings

Context Window Optimization

Authentication

Troubleshooting

Performance Tips

Best Practices

Model Selection

Resource Management

Example Configurations

Related Resources