Skip to main content
Ollama allows you to run large language models locally on your machine, providing privacy, offline capability, and no API costs. Avante.nvim has full support for Ollama.

Prerequisites

You must have Ollama installed and running before using this provider.
1

Install Ollama

Download and install from ollama.ai
# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from the website
2

Start Ollama service

ollama serve
By default, Ollama runs on http://127.0.0.1:11434
3

Pull a model

ollama pull qwen2.5-coder:14b
# or
ollama pull deepseek-coder-v2
# or
ollama pull codellama

Quick Start

{
  "yetone/avante.nvim",
  opts = {
    provider = "ollama",
    providers = {
      ollama = {
        model = "qwen2.5-coder:14b",
        -- Enable the provider by checking endpoint
        is_env_set = require("avante.providers.ollama").check_endpoint_alive,
      },
    },
  },
}

Configuration

Basic Configuration

providers = {
  ollama = {
    endpoint = "http://127.0.0.1:11434",
    model = "qwen2.5-coder:14b",
    timeout = 30000,
    extra_request_body = {
      options = {
        temperature = 0.75,
        num_ctx = 20480,
        keep_alive = "5m",
      },
    },
  },
}

Model Selection

Ollama supports many models. Some recommended for coding:
providers = {
  ollama = {
    model = "qwen2.5-coder:14b",
    extra_request_body = {
      options = {
        num_ctx = 32768,
      },
    },
  },
}

Environment Setup

Enabling the Provider

By default, Ollama is disabled. You must provide an is_env_set implementation:
providers = {
  ollama = {
    -- Check if endpoint is alive
    is_env_set = require("avante.providers.ollama").check_endpoint_alive,
  },
}
Or use a custom check:
providers = {
  ollama = {
    is_env_set = function()
      -- Your custom logic here
      return true
    end,
  },
}

Custom Endpoint

For remote Ollama instances:
providers = {
  ollama = {
    endpoint = "http://192.168.1.100:11434",
  },
}

Model Parameters

Ollama uses the options object for model parameters:
extra_request_body = {
  options = {
    temperature = 0.75,     -- Randomness (0.0-1.0)
    num_ctx = 20480,        -- Context window size
    top_p = 0.9,            -- Nucleus sampling
    top_k = 40,             -- Top-k sampling
    repeat_penalty = 1.1,   -- Repetition penalty
    keep_alive = "5m",      -- Model keep-alive duration
  },
}

Parameter Details

ParameterTypeDefaultDescription
temperaturenumber0.75Controls randomness (0.0-1.0)
num_ctxnumber2048Context window size in tokens
top_pnumber0.9Nucleus sampling threshold
top_knumber40Top-k sampling parameter
repeat_penaltynumber1.1Penalty for repetition
keep_alivestring”5m”How long to keep model loaded

List Available Models

List all models installed in Ollama:
local models = require('avante.providers').ollama:list_models()
for _, model in ipairs(models) do
  print(model.display_name)
end
Or via command line:
ollama list

Model Management

Pull Models

# Pull a specific model
ollama pull qwen2.5-coder:14b

# Pull latest version
ollama pull qwen2.5-coder:latest

# Pull a specific size variant
ollama pull llama3:70b

Remove Models

ollama rm model-name

Check Model Info

ollama show model-name

ReAct Prompting

Ollama uses ReAct-style prompting by default for tool use:
providers = {
  ollama = {
    use_ReAct_prompt = true, -- Default
  },
}
This enables better tool calling through XML-based prompting.

Advanced Configuration

Keep-Alive Settings

Control how long models stay in memory:
extra_request_body = {
  options = {
    keep_alive = "10m", -- Keep loaded for 10 minutes
    -- keep_alive = "0",   -- Unload immediately
    -- keep_alive = "-1",  -- Keep loaded indefinitely
  },
}

Context Window Optimization

Adjust based on your hardware:
extra_request_body = {
  options = {
    -- For 8GB+ VRAM
    num_ctx = 32768,
    
    -- For 4-8GB VRAM
    -- num_ctx = 16384,
    
    -- For <4GB VRAM
    -- num_ctx = 8192,
  },
}

Authentication

For secured Ollama instances:
providers = {
  ollama = {
    api_key_name = "OLLAMA_API_KEY", -- Optional
  },
}

Troubleshooting

If you see connection errors:
  1. Check if Ollama is running:
    curl http://127.0.0.1:11434/api/tags
    
  2. Start Ollama:
    ollama serve
    
  3. Verify endpoint in config matches Ollama’s address
Error: “model ‘model-name’ not found”
  1. List installed models:
    ollama list
    
  2. Pull the model:
    ollama pull model-name
    
If Ollama crashes or runs out of memory:
  1. Use a smaller model (e.g., qwen2.5-coder:7b instead of :14b)
  2. Reduce num_ctx
  3. Close other applications
  4. Consider upgrading RAM/VRAM
If responses are too slow:
  1. Use GPU acceleration (should be automatic)
  2. Try a smaller model
  3. Reduce num_ctx
  4. Ensure no other heavy processes are running

Performance Tips

Model Size

  • Larger ≠ always better
  • 7B models: Fast, good for simple tasks
  • 14B models: Balanced performance
  • 34B+ models: Best quality, slower

Context Window

  • Larger context uses more memory
  • Start with 8192-16384
  • Increase only if needed
  • Monitor memory usage

Hardware

  • GPU: Much faster than CPU
  • RAM: 16GB+ recommended
  • VRAM: 8GB+ for larger models
  • SSD: Faster model loading

Keep-Alive

  • Longer = faster responses
  • Shorter = less memory usage
  • Balance based on usage pattern

Best Practices

Model Selection

For different use cases:
-- Quick code completion
model = "qwen2.5-coder:7b"

-- Balanced coding assistance
model = "qwen2.5-coder:14b"

-- Complex refactoring
model = "deepseek-coder-v2:16b"

-- General purpose
model = "llama3:8b"

Resource Management

providers = {
  ollama = {
    timeout = 60000, -- Longer timeout for local models
    extra_request_body = {
      options = {
        keep_alive = "5m", -- Good default
        num_ctx = 16384,   -- Adjust to your RAM
      },
    },
  },
}

Example Configurations

{
  provider = "ollama",
  providers = {
    ollama = {
      endpoint = "http://127.0.0.1:11434",
      model = "qwen2.5-coder:14b",
      is_env_set = require("avante.providers.ollama").check_endpoint_alive,
      timeout = 30000,
      extra_request_body = {
        options = {
          temperature = 0.7,
          num_ctx = 20480,
          keep_alive = "5m",
        },
      },
    },
  },
}

Build docs developers (and LLMs) love