Ollama Self-Hosted Models

Ollama enables you to run powerful open-source AI models locally on your own hardware, ensuring your code never leaves your machine.

Why Use Ollama?

Complete Privacy

Your code and commits stay on your local machine - never sent to external servers

No API Costs

No usage fees or API quotas - unlimited commits once set up

Offline Access

Works without internet connection after initial model download

Full Control

Choose from dozens of open-source models and customize parameters

Prerequisites

Before using Ollama with GitWhisper, you need to:

Install Ollama: Download from ollama.com
Pull a model: Download at least one AI model
Start the server: Ollama must be running locally

Installation

macOS
Linux
Windows

# Download from ollama.com or use Homebrew
brew install ollama

# Start Ollama
ollama serve

# Install script
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama
ollama serve

Download Models

Pull models you want to use:

# Popular models for code
ollama pull llama3.2
ollama pull codellama
ollama pull deepseek-coder
ollama pull qwen2.5-coder

# Smaller, faster models
ollama pull phi3
ollama pull gemma2

# Verify available models
ollama list

Start with llama3.2 or codellama - they offer a good balance of quality and speed for commit messages.

Available Models

Ollama supports hundreds of models. Popular choices for GitWhisper:

Recommended for Code

llama3.2:latest ⭐ (default) - Great all-around choice
codellama - Optimized for code understanding
deepseek-coder - Excellent code generation and analysis
qwen2.5-coder - Strong coding capabilities
granite-code - IBM’s code-focused model

Fast & Lightweight

phi3 - Microsoft’s compact model
gemma2 - Google’s efficient model
tinyllama - Ultra-fast but basic

Large & Powerful

llama3:70b - More capable, requires more RAM
mixtral - High quality, mixture of experts
command-r - Cohere’s advanced model

Browse all available models at ollama.com/library

Usage

Basic Usage

Make sure Ollama is running, then:

# Use Ollama with default model (llama3.2:latest)
gitwhisper commit --model ollama

# Shorthand
gw commit -m ollama

Specify Model

Use a specific Ollama model:

# Use CodeLlama
gitwhisper commit --model ollama --model-variant codellama

# Use DeepSeek Coder
gitwhisper commit --model ollama --model-variant deepseek-coder

# Use specific version
gitwhisper commit --model ollama --model-variant llama3.2:3b

Set as Default

Configure Ollama as your default model:

# Set Ollama with default model
gitwhisper set-defaults --model ollama

# Set Ollama with specific model
gitwhisper set-defaults --model ollama --model-variant codellama

# Set custom base URL (if not using default)
gitwhisper set-defaults --model ollama --model-variant llama3.2 --base-url http://localhost:11434

Custom Base URL

If Ollama is running on a different host or port:

# Custom port
gitwhisper commit --model ollama --base-url http://localhost:8080

# Remote Ollama server
gitwhisper commit --model ollama --base-url http://192.168.1.100:11434

# Save as default
gitwhisper set-defaults --model ollama --base-url http://localhost:8080

See Custom Endpoints for more details.

No API Key Required

Ollama doesn’t require an API key:

# Just use it directly
gitwhisper commit --model ollama

# No save-key needed
# No environment variables needed

Hardware Requirements

Model performance depends on your hardware:

Minimum
Recommended
Optimal

Basic laptop/desktop

8GB RAM
Any modern CPU
No GPU required

Recommended models:

phi3
tinyllama
llama3.2:3b

Response time: 5-15 seconds

Larger models (70B parameters) require significant RAM (32GB+) and will be slow without a GPU.

Model Comparison

Quality vs Speed
Best For

Model	Quality	Speed	Size	RAM Needed
llama3:70b	⭐⭐⭐⭐⭐	⭐	40GB	32GB+
codellama	⭐⭐⭐⭐	⭐⭐⭐	7GB	8GB
llama3.2	⭐⭐⭐⭐	⭐⭐⭐⭐	2GB	8GB
deepseek-coder	⭐⭐⭐⭐	⭐⭐⭐	7GB	8GB
phi3	⭐⭐⭐	⭐⭐⭐⭐⭐	2GB	4GB
tinyllama	⭐⭐	⭐⭐⭐⭐⭐	700MB	2GB

Model	Best Use Case
codellama	Code-heavy commits with technical details
llama3.2	Balanced choice for everyday use
deepseek-coder	Complex code analysis and generation
phi3	Fast commits on limited hardware
qwen2.5-coder	Alternative to CodeLlama
mixtral	High quality with good speed

Code Analysis

Use Ollama for local code analysis:

# Analyze changes locally
gitwhisper analyze --model ollama

# Use specialized code model
gitwhisper analyze --model ollama --model-variant codellama

Analysis Benefits

Private: Code never leaves your machine
Fast: No network latency
Detailed: Generate comprehensive reports
Unlimited: No API quotas or rate limits

GPU Acceleration

Ollama automatically uses your GPU if available:

NVIDIA GPUs

# Ollama detects NVIDIA GPU automatically
# Install CUDA drivers if not detected

# Check GPU usage
watch -n 1 nvidia-smi

Apple Silicon (M1/M2/M3)

# Metal acceleration is automatic
# No additional setup needed

AMD GPUs

# ROCm support on Linux
# Check Ollama documentation for setup

GPU acceleration can make models 10-50x faster. For best results, use a GPU with 8GB+ VRAM.

Troubleshooting

Ollama Not Running

Error: Connection refused

Solution: Start Ollama:

# macOS/Linux
ollama serve

# Or check if already running
ps aux | grep ollama

Model Not Found

Error: Model not found

Solution: Pull the model first:

ollama pull llama3.2

# List available models
ollama list

Out of Memory

Error: Out of memory

Solution: Use a smaller model:

# Try a lighter model
ollama pull phi3
gitwhisper commit --model ollama --model-variant phi3

Slow Performance

Solutions:

Use a smaller model (phi3, gemma2)
Enable GPU acceleration
Close other applications
Use quantized models (models with :q4 or :q8 suffix)

Custom Port

If Ollama is on a different port:

# Set custom URL
gitwhisper set-defaults --model ollama --base-url http://localhost:8080

Best Practices

Start with recommended models: Use llama3.2 or codellama first
Monitor resource usage: Watch RAM/GPU with htop or Activity Monitor
Keep models updated: Run ollama pull <model> periodically
Use GPU when available: Much faster than CPU-only
Match model to hardware: Don’t use 70B models on 8GB RAM

Comparison with Cloud Models

Advantages

Complete privacy
No API costs
Offline capability
Unlimited usage
No rate limits

Trade-offs

Requires local hardware
Setup complexity
May be slower
Quality varies by model
Uses system resources

Example Workflow

# 1. Install and start Ollama
ollama serve

# 2. Pull a model (in another terminal)
ollama pull codellama

# 3. Set as default in GitWhisper
gitwhisper set-defaults --model ollama --model-variant codellama

# 4. Use for commits
git add .
gitwhisper commit

# 5. Analyze code changes
gitwhisper analyze

Next Steps

Custom Endpoints

Configure custom Ollama URLs

Model Variants

View all available models

Cloud Models

Compare with cloud-based options

Configuration

Set up default preferences

Supported Providers

Model Configuration

​Why Use Ollama?

Complete Privacy

No API Costs

Offline Access

Full Control

​Prerequisites

​Installation

​Download Models

​Available Models

​Recommended for Code

​Fast & Lightweight

​Large & Powerful

​Usage

​Basic Usage

​Specify Model

​Set as Default

​Custom Base URL

​No API Key Required

​Hardware Requirements

​Model Comparison

​Code Analysis

​Analysis Benefits

​GPU Acceleration

​NVIDIA GPUs

​Apple Silicon (M1/M2/M3)

​AMD GPUs

​Troubleshooting

​Best Practices

​Comparison with Cloud Models

Advantages

Trade-offs

​Example Workflow

​Popular Model Links

​Next Steps

Custom Endpoints

Model Variants

Cloud Models

Configuration

Build docs developers (and LLMs) love

Why Use Ollama?

Prerequisites

Installation

Download Models

Available Models

Recommended for Code

Fast & Lightweight

Large & Powerful

Usage

Basic Usage

Specify Model

Set as Default

Custom Base URL

No API Key Required

Hardware Requirements

Model Comparison

Code Analysis

Analysis Benefits

GPU Acceleration

NVIDIA GPUs

Apple Silicon (M1/M2/M3)

AMD GPUs

Troubleshooting

Best Practices

Comparison with Cloud Models

Example Workflow

Popular Model Links

Next Steps