Skip to main content
Ollama provides completely private AI processing on your local machine. Data never leaves your computer, there are no API costs, and you can work offline.

Why Choose Ollama?

100% Private

All processing happens locally - your data never leaves your computer

Zero Cost

No API fees or usage limits - completely free to use

Works Offline

No internet connection required after model download

Multiple Models

Support for Llama, Mistral, CodeLlama, Gemma, and more

System Requirements

RequirementMinimumRecommended
RAM8GB16GB+
Storage5GB10GB+
CPUDual-coreQuad-core+
GPUOptionalNVIDIA/AMD for faster inference
Models vary in size. Smaller models (7B parameters) need ~4GB RAM, while larger models (13B+) may need 16GB or more.

Installation

1

Install Ollama

Download and install Ollama from ollama.ai
# Download from ollama.ai or use brew
brew install ollama
2

Start Ollama Service

Ollama runs as a background service:
ollama serve
On macOS/Windows, Ollama starts automatically. On Linux, you may need to run this command.
3

Pull a Model

Download an AI model to use with Cluely:
# Recommended: Llama 3.2 (balanced performance)
ollama pull llama3.2

# Alternative: Gemma (lightweight)
ollama pull gemma:latest

# For coding: CodeLlama
ollama pull codellama

# For speed: Mistral
ollama pull mistral
First model download can take 5-30 minutes depending on model size and internet speed.
4

Configure Cluely

Add Ollama settings to your .env file:
USE_OLLAMA=true
OLLAMA_MODEL=llama3.2
OLLAMA_URL=http://localhost:11434
5

Verify Setup

Start Cluely and check for:
[LLMHelper] Using Ollama with model: llama3.2

Configuration

Environment Variables

VariableRequiredDescriptionDefault
USE_OLLAMAYesEnable Ollama modefalse
OLLAMA_MODELNoModel to usegemma:latest
OLLAMA_URLNoOllama API endpointhttp://localhost:11434

Supported Models

Cluely works with any Ollama model, but these are recommended:
Best for: General purpose, balanced performance
ollama pull llama3.2
  • Size: ~4.7GB
  • Parameters: 7B
  • Speed: Fast
  • Quality: Excellent

Model Auto-Detection

Cluely automatically detects and uses available models:
// Auto-detection logic (source/electron/LLMHelper.ts:331-335)
if (!availableModels.includes(this.ollamaModel)) {
  this.ollamaModel = availableModels[0]
  console.log(`[LLMHelper] Auto-selected first available model: ${this.ollamaModel}`)
}
If your configured model isn’t found, Cluely uses the first available model automatically.

API Configuration

Request Parameters

Cluely sends optimized parameters to Ollama:
// Ollama request configuration (source/electron/LLMHelper.ts:211-218)
{
  model: this.ollamaModel,
  prompt: prompt,
  stream: false,
  options: {
    temperature: 0.7,    // Balanced creativity
    top_p: 0.9          // High-quality responses
  }
}

Endpoint

Ollama API endpoint:
POST http://localhost:11434/api/generate
Change OLLAMA_URL if running Ollama on a different host or port.

Limitations with Ollama

Ollama has some limitations compared to cloud providers:

Vision/Image Analysis

Ollama cannot directly analyze images. For screenshot analysis:
  1. Images are skipped, and generic guidance is provided
  2. Consider using Gemini for vision features alongside Ollama for text
// Ollama limitation note from source
// OpenRouter implementation shows guidance-only approach for images

Audio Processing

Voice features always use Gemini, even when Ollama is enabled for text chat.
Cluely maintains a separate Gemini client for audio processing:
// Requires GEMINI_API_KEY for voice features
private geminiVoiceClient: GoogleGenAI | null = null

Managing Models

List Available Models

ollama list

Remove a Model

ollama rm llama3.2

Update a Model

ollama pull llama3.2

Check Model Info

ollama show llama3.2

Performance Optimization

  • 2-7B parameters: Fast, lower memory (4-8GB RAM)
  • 13B parameters: Slower, higher quality (16GB+ RAM)
  • 70B+ parameters: Very slow, best quality (64GB+ RAM, GPU recommended)
For most users: llama3.2 (7B) offers the best balance.
Ollama automatically uses GPU if available:
  • NVIDIA: CUDA support built-in
  • AMD: ROCm support on Linux
  • Apple Silicon: Metal acceleration on macOS
Check GPU usage:
ollama ps
Ollama keeps models in memory for faster responses:
  • Models stay loaded for ~5 minutes after last use
  • Automatically unloaded when memory is needed
  • Preload models: Send a test prompt on startup

Switching to Ollama

At Startup

Set environment variables in .env:
USE_OLLAMA=true
OLLAMA_MODEL=llama3.2

At Runtime

Switch from another provider to Ollama:
// Switch to Ollama with specific model
await llmHelper.switchToOllama('llama3.2', 'http://localhost:11434')

// Switch to Ollama with auto-detected model
await llmHelper.switchToOllama()

Troubleshooting

Error: Failed to connect to Ollama: Make sure Ollama is running on http://localhost:11434Solutions:
  1. Check Ollama is running: ollama list
  2. Start Ollama: ollama serve
  3. Verify port 11434 is not blocked by firewall
  4. Check OLLAMA_URL matches your setup
Error: [LLMHelper] No Ollama models foundSolutions:
  1. Pull a model: ollama pull llama3.2
  2. Verify with: ollama list
  3. Check Ollama data directory has sufficient space
Issue: Model takes too long to respondSolutions:
  1. Use a smaller model (e.g., gemma instead of codellama)
  2. Ensure sufficient RAM is available
  3. Enable GPU acceleration if available
  4. Close other memory-intensive applications
Issue: Configured model not being usedSolutions:
  1. Verify model is pulled: ollama list
  2. Check exact model name matches (case-sensitive)
  3. Let Cluely auto-detect: Remove OLLAMA_MODEL from .env
  4. Restart Cluely after changing models

Best Practices

  1. Start with llama3.2: Best balance of speed, quality, and resource usage
  2. Preload on startup: Send a test prompt when app starts to keep model in memory
  3. Monitor memory: Use Activity Monitor/Task Manager to ensure sufficient RAM
  4. Keep models updated: Periodically ollama pull to get model improvements
  5. Use SSD storage: Store Ollama models on SSD for faster loading

Next Steps

Gemini Setup

Add Gemini for vision and audio features

Provider Comparison

Compare all available AI providers

Build docs developers (and LLMs) love