Skip to main content
Ollama allows you to run powerful AI models locally on your machine, providing completely free and private code analysis without any API keys or internet connection.

Why Ollama?

  • Completely free - No API costs, unlimited usage
  • Private - Your code never leaves your machine
  • Offline - Works without internet connection
  • No rate limits - Analyze as much code as you want
  • Multiple models - Choose from various open-source models

Prerequisites

  • RAM: At least 8GB (16GB recommended for larger models)
  • Storage: 5-10GB for model files
  • OS: macOS, Linux, or Windows

Install Ollama

1

Download Ollama

Visit ollama.ai and download the installer for your operating system.
2

Install

Run the installer and follow the installation instructions.
3

Verify installation

Open a terminal and run:
ollama --version
# Download and install from ollama.ai, or use Homebrew:
brew install ollama

Pull a model

Download a model for code analysis:
# Default model (recommended)
ollama pull qwen2.5:7b

# Alternative models
ollama pull llama3.1
ollama pull codellama:13b
ollama pull mistral
Vibrant uses qwen2.5:7b by default. It provides excellent code analysis quality with reasonable resource usage.

Setup

Ollama runs on http://localhost:11434 by default. If you’re using the default settings, no configuration is needed:
# Start Ollama (usually starts automatically after installation)
ollama serve

# In another terminal, run Vibrant
vibrant . --ai --provider ollama

Option 2: Custom host

If Ollama is running on a different host or port:
export OLLAMA_HOST="http://localhost:11434"
vibrant . --ai --provider ollama
Or use OLLAMA_BASE_URL:
export OLLAMA_BASE_URL="http://localhost:11434"
vibrant . --ai --provider ollama

Option 3: .env file

Create a .env file in your project:
.env
OLLAMA_HOST=http://localhost:11434

Option 4: Configuration file

vibrant.config.js
module.exports = {
  provider: 'ollama',
};

Usage

Run Vibrant with Ollama:
# Make sure Ollama is running
ollama serve

# In another terminal
vibrant . --ai --provider ollama

Available models

Vibrant supports any Ollama model, but these are recommended for code analysis:
  • Size: ~4.7GB
  • RAM: 8GB minimum
  • Speed: Fast
  • Quality: Excellent for code
  • Best for: General code analysis, daily use
  • Size: ~4.7GB (8B model)
  • RAM: 8GB minimum
  • Speed: Fast
  • Quality: Very good
  • Best for: General purpose analysis
  • Size: ~7.4GB
  • RAM: 16GB recommended
  • Speed: Medium
  • Quality: Excellent for code
  • Best for: Specialized code analysis
  • Size: ~4.1GB
  • RAM: 8GB minimum
  • Speed: Very fast
  • Quality: Good
  • Best for: Quick analysis, resource-constrained systems

Change the model

Set the OLLAMA_MODEL environment variable:
export OLLAMA_MODEL="llama3.1"
vibrant . --ai --provider ollama
Or in your .env file:
.env
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=llama3.1

Example output

vibrant . --ai --provider ollama
🔮 Vibrant Analysis
─────────────────────

📡 AI Analysis (ollama:qwen2.5:7b)

The codebase contains several security vulnerabilities and code quality
issues. The most critical concern is hardcoded credentials in multiple
files. Error handling is inconsistent, with empty catch blocks that
silently swallow exceptions. Debug logging statements are present in
production code.

Key findings:
- API keys hardcoded in src/api.ts (line 24) and src/config.ts (line 15)
- 6 empty catch blocks across the codebase
- console.log statements in 12 files
- SQL queries vulnerable to injection in src/db.ts

Recommendations:
- Move all secrets to environment variables immediately
- Implement structured error logging
- Remove debug code before deployment
- Use parameterized queries for database operations

✔ Analysis complete

✕ 4 errors · ⚠ 12 warnings · 85 files · 15s
Ollama streams the response in real-time, so you’ll see the analysis being generated as it processes your code.

Troubleshooting

Error: Connection refused

Ollama API error: fetch failed
Solution: Make sure Ollama is running:
# Start Ollama
ollama serve

# Check if it's running
curl http://localhost:11434/api/version

Error: Model not found

Ollama API error: model 'qwen2.5:7b' not found
Solution: Pull the model first:
ollama pull qwen2.5:7b

Error: Out of memory

If Ollama crashes or runs very slowly: Solutions:
  1. Use a smaller model:
    ollama pull mistral
    export OLLAMA_MODEL="mistral"
    vibrant . --ai --provider ollama
    
  2. Close other applications to free up RAM
  3. Analyze fewer files:
    vibrant src/ --ai --provider ollama
    

Slow performance

Solutions:
  • Use a smaller, faster model like mistral
  • Close unnecessary applications to free resources
  • Analyze specific directories instead of entire codebase
  • Consider using a cloud provider for large projects

Best practices

1

Keep Ollama running

Run ollama serve in the background or configure it to start on boot.
2

Use appropriate models

  • Small projects: mistral (fastest)
  • Medium projects: qwen2.5:7b (balanced)
  • Large projects or specialized code: codellama:13b (best quality)
3

Monitor resources

Keep an eye on RAM usage. If your system struggles, switch to a smaller model.
4

Perfect for sensitive code

Use Ollama when working with proprietary or sensitive code that cannot be sent to external APIs.

Performance tips

GPU acceleration (optional)

If you have an NVIDIA GPU:
# Ollama automatically uses GPU if available
# Verify GPU usage:
nvidia-smi
GPU acceleration significantly improves performance.

Reduce context size

Analyze specific directories to reduce memory usage:
# Instead of analyzing everything
vibrant . --ai --provider ollama

# Analyze specific paths
vibrant src/ --ai --provider ollama
vibrant "src/**/*.ts" --ai --provider ollama

When to use Ollama

Ollama is perfect for:
  • Privacy-sensitive projects - Keep your code on your machine
  • Unlimited analysis - No API costs or rate limits
  • Offline development - Work without internet connection
  • Learning and experimentation - Try different models freely
  • CI/CD on self-hosted runners - No external dependencies

Model comparison

ModelSizeRAMSpeedCode QualityBest For
mistral4.1GB8GBFastGoodQuick checks
qwen2.5:7b4.7GB8GBFastExcellentDaily use
llama3.14.7GB8GBFastVery goodGeneral purpose
codellama:13b7.4GB16GBMediumExcellentCode specialization

Advanced configuration

Run Ollama on a different port

OLLAMA_HOST=0.0.0.0:8080 ollama serve
Then configure Vibrant:
export OLLAMA_HOST="http://localhost:8080"
vibrant . --ai --provider ollama

Use remote Ollama instance

Run Ollama on a powerful server and connect from your laptop:
# On the server
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# On your laptop
export OLLAMA_HOST="http://server-ip:11434"
vibrant . --ai --provider ollama

Next steps

AI providers overview

Compare all available providers

Browse models

Explore more models on Ollama’s library

Build docs developers (and LLMs) love