Ollama Local AI Setup

Ollama provides completely private AI processing on your local machine. Data never leaves your computer, there are no API costs, and you can work offline.

Why Choose Ollama?

100% Private

All processing happens locally - your data never leaves your computer

Zero Cost

No API fees or usage limits - completely free to use

Works Offline

No internet connection required after model download

Multiple Models

Support for Llama, Mistral, CodeLlama, Gemma, and more

System Requirements

Requirement	Minimum	Recommended
RAM	8GB	16GB+
Storage	5GB	10GB+
CPU	Dual-core	Quad-core+
GPU	Optional	NVIDIA/AMD for faster inference

Models vary in size. Smaller models (7B parameters) need ~4GB RAM, while larger models (13B+) may need 16GB or more.

Installation

Install Ollama

Download and install Ollama from ollama.ai

# Download from ollama.ai or use brew
brew install ollama

Start Ollama Service

Ollama runs as a background service:

ollama serve

On macOS/Windows, Ollama starts automatically. On Linux, you may need to run this command.

Pull a Model

Download an AI model to use with Cluely:

# Recommended: Llama 3.2 (balanced performance)
ollama pull llama3.2

# Alternative: Gemma (lightweight)
ollama pull gemma:latest

# For coding: CodeLlama
ollama pull codellama

# For speed: Mistral
ollama pull mistral

First model download can take 5-30 minutes depending on model size and internet speed.

Configure Cluely

Add Ollama settings to your .env file:

USE_OLLAMA=true
OLLAMA_MODEL=llama3.2
OLLAMA_URL=http://localhost:11434

Verify Setup

Start Cluely and check for:

[LLMHelper] Using Ollama with model: llama3.2

Configuration

Environment Variables

Variable	Required	Description	Default
`USE_OLLAMA`	Yes	Enable Ollama mode	`false`
`OLLAMA_MODEL`	No	Model to use	`gemma:latest`
`OLLAMA_URL`	No	Ollama API endpoint	`http://localhost:11434`

Supported Models

Cluely works with any Ollama model, but these are recommended:

Llama 3.2
Gemma
CodeLlama
Mistral

Best for: General purpose, balanced performance

ollama pull llama3.2

Size: ~4.7GB
Parameters: 7B
Speed: Fast
Quality: Excellent

Best for: Lightweight, fast responses

ollama pull gemma:latest

Size: ~2.5GB
Parameters: 2B-7B
Speed: Very Fast
Quality: Good

Best for: Programming and code analysis

ollama pull codellama

Size: ~7GB
Parameters: 13B
Speed: Moderate
Quality: Excellent for code

Best for: Fast, efficient responses

ollama pull mistral

Size: ~4.1GB
Parameters: 7B
Speed: Very Fast
Quality: Very Good

Model Auto-Detection

Cluely automatically detects and uses available models:

// Auto-detection logic (source/electron/LLMHelper.ts:331-335)
if (!availableModels.includes(this.ollamaModel)) {
  this.ollamaModel = availableModels[0]
  console.log(`[LLMHelper] Auto-selected first available model: ${this.ollamaModel}`)
}

If your configured model isn’t found, Cluely uses the first available model automatically.

API Configuration

Request Parameters

Cluely sends optimized parameters to Ollama:

// Ollama request configuration (source/electron/LLMHelper.ts:211-218)
{
  model: this.ollamaModel,
  prompt: prompt,
  stream: false,
  options: {
    temperature: 0.7,    // Balanced creativity
    top_p: 0.9          // High-quality responses
  }
}

Endpoint

Ollama API endpoint:

POST http://localhost:11434/api/generate

Change OLLAMA_URL if running Ollama on a different host or port.

Limitations with Ollama

Ollama has some limitations compared to cloud providers:

Vision/Image Analysis

Ollama cannot directly analyze images. For screenshot analysis:

Images are skipped, and generic guidance is provided
Consider using Gemini for vision features alongside Ollama for text

// Ollama limitation note from source
// OpenRouter implementation shows guidance-only approach for images

Audio Processing

Voice features always use Gemini, even when Ollama is enabled for text chat.

Cluely maintains a separate Gemini client for audio processing:

// Requires GEMINI_API_KEY for voice features
private geminiVoiceClient: GoogleGenAI | null = null

Managing Models

List Available Models

ollama list

Remove a Model

ollama rm llama3.2

Update a Model

ollama pull llama3.2

Check Model Info

ollama show llama3.2

Performance Optimization

Choose the Right Model Size

2-7B parameters: Fast, lower memory (4-8GB RAM)
13B parameters: Slower, higher quality (16GB+ RAM)
70B+ parameters: Very slow, best quality (64GB+ RAM, GPU recommended)

For most users: llama3.2 (7B) offers the best balance.

GPU Acceleration

Ollama automatically uses GPU if available:

NVIDIA: CUDA support built-in
AMD: ROCm support on Linux
Apple Silicon: Metal acceleration on macOS

Check GPU usage:

ollama ps

Memory Management

Ollama keeps models in memory for faster responses:

Models stay loaded for ~5 minutes after last use
Automatically unloaded when memory is needed
Preload models: Send a test prompt on startup

Switching to Ollama

At Startup

Set environment variables in .env:

USE_OLLAMA=true
OLLAMA_MODEL=llama3.2

At Runtime

Switch from another provider to Ollama:

// Switch to Ollama with specific model
await llmHelper.switchToOllama('llama3.2', 'http://localhost:11434')

// Switch to Ollama with auto-detected model
await llmHelper.switchToOllama()

Troubleshooting

Ollama Not Available

Error: Failed to connect to Ollama: Make sure Ollama is running on http://localhost:11434Solutions:

Check Ollama is running: ollama list
Start Ollama: ollama serve
Verify port 11434 is not blocked by firewall
Check OLLAMA_URL matches your setup

No Models Found

Error: [LLMHelper] No Ollama models foundSolutions:

Pull a model: ollama pull llama3.2
Verify with: ollama list
Check Ollama data directory has sufficient space

Slow Responses

Issue: Model takes too long to respondSolutions:

Use a smaller model (e.g., gemma instead of codellama)
Ensure sufficient RAM is available
Enable GPU acceleration if available
Close other memory-intensive applications

Model Selection Not Working

Issue: Configured model not being usedSolutions:

Verify model is pulled: ollama list
Check exact model name matches (case-sensitive)
Let Cluely auto-detect: Remove OLLAMA_MODEL from .env
Restart Cluely after changing models

Best Practices

Start with llama3.2: Best balance of speed, quality, and resource usage
Preload on startup: Send a test prompt when app starts to keep model in memory
Monitor memory: Use Activity Monitor/Task Manager to ensure sufficient RAM
Keep models updated: Periodically ollama pull to get model improvements
Use SSD storage: Store Ollama models on SSD for faster loading

Get Started

Core Features

AI Providers

Guides

Ollama Local AI Setup

Why Choose Ollama?

100% Private

Zero Cost

Works Offline

Multiple Models

System Requirements

Installation

Configuration

Environment Variables

Supported Models

Model Auto-Detection

API Configuration

Request Parameters

Endpoint

Limitations with Ollama

Vision/Image Analysis

Audio Processing

Managing Models

List Available Models

Remove a Model

Update a Model

Check Model Info

Performance Optimization

Switching to Ollama

At Startup

At Runtime

Troubleshooting

Best Practices

Next Steps

Gemini Setup

Provider Comparison

Build docs developers (and LLMs) love

Get Started

Core Features

AI Providers

Guides

​Why Choose Ollama?

100% Private

Zero Cost

Works Offline

Multiple Models

​System Requirements

​Installation

​Configuration

​Environment Variables

​Supported Models

​Model Auto-Detection

​API Configuration

​Request Parameters

​Endpoint

​Limitations with Ollama

​Vision/Image Analysis

​Audio Processing

​Managing Models

​List Available Models

​Remove a Model

​Update a Model

​Check Model Info

​Performance Optimization

​Switching to Ollama

​At Startup

​At Runtime

​Troubleshooting

​Best Practices

​Next Steps

Gemini Setup

Provider Comparison

Build docs developers (and LLMs) love

Why Choose Ollama?

System Requirements

Installation

Configuration

Environment Variables

Supported Models

Model Auto-Detection

API Configuration

Request Parameters

Endpoint

Limitations with Ollama

Vision/Image Analysis

Audio Processing

Managing Models

List Available Models

Remove a Model

Update a Model

Check Model Info

Performance Optimization

Switching to Ollama

At Startup

At Runtime

Troubleshooting

Best Practices

Next Steps