Why Ollama?
- Completely free - No API costs, unlimited usage
- Private - Your code never leaves your machine
- Offline - Works without internet connection
- No rate limits - Analyze as much code as you want
- Multiple models - Choose from various open-source models
Prerequisites
- RAM: At least 8GB (16GB recommended for larger models)
- Storage: 5-10GB for model files
- OS: macOS, Linux, or Windows
Install Ollama
Download Ollama
Visit ollama.ai and download the installer for your operating system.
Pull a model
Download a model for code analysis:Vibrant uses qwen2.5:7b by default. It provides excellent code analysis quality with reasonable resource usage.
Setup
Option 1: Default configuration (recommended)
Ollama runs onhttp://localhost:11434 by default. If you’re using the default settings, no configuration is needed:
Option 2: Custom host
If Ollama is running on a different host or port:OLLAMA_BASE_URL:
Option 3: .env file
Create a.env file in your project:
.env
Option 4: Configuration file
vibrant.config.js
Usage
Run Vibrant with Ollama:Available models
Vibrant supports any Ollama model, but these are recommended for code analysis:qwen2.5:7b (default)
qwen2.5:7b (default)
- Size: ~4.7GB
- RAM: 8GB minimum
- Speed: Fast
- Quality: Excellent for code
- Best for: General code analysis, daily use
llama3.1
llama3.1
- Size: ~4.7GB (8B model)
- RAM: 8GB minimum
- Speed: Fast
- Quality: Very good
- Best for: General purpose analysis
codellama:13b
codellama:13b
- Size: ~7.4GB
- RAM: 16GB recommended
- Speed: Medium
- Quality: Excellent for code
- Best for: Specialized code analysis
mistral
mistral
- Size: ~4.1GB
- RAM: 8GB minimum
- Speed: Very fast
- Quality: Good
- Best for: Quick analysis, resource-constrained systems
Change the model
Set theOLLAMA_MODEL environment variable:
.env file:
.env
Example output
Ollama streams the response in real-time, so you’ll see the analysis being generated as it processes your code.
Troubleshooting
Error: Connection refused
Error: Model not found
Error: Out of memory
If Ollama crashes or runs very slowly: Solutions:-
Use a smaller model:
- Close other applications to free up RAM
-
Analyze fewer files:
Slow performance
Solutions:- Use a smaller, faster model like
mistral - Close unnecessary applications to free resources
- Analyze specific directories instead of entire codebase
- Consider using a cloud provider for large projects
Best practices
Use appropriate models
- Small projects:
mistral(fastest) - Medium projects:
qwen2.5:7b(balanced) - Large projects or specialized code:
codellama:13b(best quality)
Performance tips
GPU acceleration (optional)
If you have an NVIDIA GPU:Reduce context size
Analyze specific directories to reduce memory usage:When to use Ollama
Ollama is perfect for:- Privacy-sensitive projects - Keep your code on your machine
- Unlimited analysis - No API costs or rate limits
- Offline development - Work without internet connection
- Learning and experimentation - Try different models freely
- CI/CD on self-hosted runners - No external dependencies
Model comparison
| Model | Size | RAM | Speed | Code Quality | Best For |
|---|---|---|---|---|---|
| mistral | 4.1GB | 8GB | Fast | Good | Quick checks |
| qwen2.5:7b | 4.7GB | 8GB | Fast | Excellent | Daily use |
| llama3.1 | 4.7GB | 8GB | Fast | Very good | General purpose |
| codellama:13b | 7.4GB | 16GB | Medium | Excellent | Code specialization |
Advanced configuration
Run Ollama on a different port
Use remote Ollama instance
Run Ollama on a powerful server and connect from your laptop:Next steps
AI providers overview
Compare all available providers
Browse models
Explore more models on Ollama’s library