Why Use Ollama?
Complete Privacy
Your code and commits stay on your local machine - never sent to external servers
No API Costs
No usage fees or API quotas - unlimited commits once set up
Offline Access
Works without internet connection after initial model download
Full Control
Choose from dozens of open-source models and customize parameters
Prerequisites
Before using Ollama with GitWhisper, you need to:- Install Ollama: Download from ollama.com
- Pull a model: Download at least one AI model
- Start the server: Ollama must be running locally
Installation
- macOS
- Linux
- Windows
Download Models
Pull models you want to use:Available Models
Ollama supports hundreds of models. Popular choices for GitWhisper:Recommended for Code
- llama3.2:latest ⭐ (default) - Great all-around choice
- codellama - Optimized for code understanding
- deepseek-coder - Excellent code generation and analysis
- qwen2.5-coder - Strong coding capabilities
- granite-code - IBM’s code-focused model
Fast & Lightweight
- phi3 - Microsoft’s compact model
- gemma2 - Google’s efficient model
- tinyllama - Ultra-fast but basic
Large & Powerful
- llama3:70b - More capable, requires more RAM
- mixtral - High quality, mixture of experts
- command-r - Cohere’s advanced model
Browse all available models at ollama.com/library
Usage
Basic Usage
Make sure Ollama is running, then:Specify Model
Use a specific Ollama model:Set as Default
Configure Ollama as your default model:Custom Base URL
If Ollama is running on a different host or port:No API Key Required
Ollama doesn’t require an API key:Hardware Requirements
Model performance depends on your hardware:- Minimum
- Recommended
- Optimal
Basic laptop/desktop
- 8GB RAM
- Any modern CPU
- No GPU required
- phi3
- tinyllama
- llama3.2:3b
Model Comparison
- Quality vs Speed
- Best For
| Model | Quality | Speed | Size | RAM Needed |
|---|---|---|---|---|
| llama3:70b | ⭐⭐⭐⭐⭐ | ⭐ | 40GB | 32GB+ |
| codellama | ⭐⭐⭐⭐ | ⭐⭐⭐ | 7GB | 8GB |
| llama3.2 | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 2GB | 8GB |
| deepseek-coder | ⭐⭐⭐⭐ | ⭐⭐⭐ | 7GB | 8GB |
| phi3 | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 2GB | 4GB |
| tinyllama | ⭐⭐ | ⭐⭐⭐⭐⭐ | 700MB | 2GB |
Code Analysis
Use Ollama for local code analysis:Analysis Benefits
- Private: Code never leaves your machine
- Fast: No network latency
- Detailed: Generate comprehensive reports
- Unlimited: No API quotas or rate limits
GPU Acceleration
Ollama automatically uses your GPU if available:NVIDIA GPUs
Apple Silicon (M1/M2/M3)
AMD GPUs
Troubleshooting
Ollama Not Running
Ollama Not Running
Model Not Found
Model Not Found
Out of Memory
Out of Memory
Slow Performance
Slow Performance
Solutions:
- Use a smaller model (phi3, gemma2)
- Enable GPU acceleration
- Close other applications
- Use quantized models (models with
:q4or:q8suffix)
Custom Port
Custom Port
If Ollama is on a different port:
Best Practices
- Start with recommended models: Use llama3.2 or codellama first
- Monitor resource usage: Watch RAM/GPU with
htopor Activity Monitor - Keep models updated: Run
ollama pull <model>periodically - Use GPU when available: Much faster than CPU-only
- Match model to hardware: Don’t use 70B models on 8GB RAM
Comparison with Cloud Models
Advantages
- Complete privacy
- No API costs
- Offline capability
- Unlimited usage
- No rate limits
Trade-offs
- Requires local hardware
- Setup complexity
- May be slower
- Quality varies by model
- Uses system resources
Example Workflow
Popular Model Links
Explore models on Ollama’s library:Next Steps
Custom Endpoints
Configure custom Ollama URLs
Model Variants
View all available models
Cloud Models
Compare with cloud-based options
Configuration
Set up default preferences