Why Choose Ollama?
100% Private
All processing happens locally - your data never leaves your computer
Zero Cost
No API fees or usage limits - completely free to use
Works Offline
No internet connection required after model download
Multiple Models
Support for Llama, Mistral, CodeLlama, Gemma, and more
System Requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| RAM | 8GB | 16GB+ |
| Storage | 5GB | 10GB+ |
| CPU | Dual-core | Quad-core+ |
| GPU | Optional | NVIDIA/AMD for faster inference |
Models vary in size. Smaller models (7B parameters) need ~4GB RAM, while larger models (13B+) may need 16GB or more.
Installation
Install Ollama
Download and install Ollama from ollama.ai
Start Ollama Service
Ollama runs as a background service:
On macOS/Windows, Ollama starts automatically. On Linux, you may need to run this command.
Configuration
Environment Variables
| Variable | Required | Description | Default |
|---|---|---|---|
USE_OLLAMA | Yes | Enable Ollama mode | false |
OLLAMA_MODEL | No | Model to use | gemma:latest |
OLLAMA_URL | No | Ollama API endpoint | http://localhost:11434 |
Supported Models
Cluely works with any Ollama model, but these are recommended:- Llama 3.2
- Gemma
- CodeLlama
- Mistral
Best for: General purpose, balanced performance
- Size: ~4.7GB
- Parameters: 7B
- Speed: Fast
- Quality: Excellent
Model Auto-Detection
Cluely automatically detects and uses available models:API Configuration
Request Parameters
Cluely sends optimized parameters to Ollama:Endpoint
Ollama API endpoint:Change
OLLAMA_URL if running Ollama on a different host or port.Limitations with Ollama
Vision/Image Analysis
Ollama cannot directly analyze images. For screenshot analysis:- Images are skipped, and generic guidance is provided
- Consider using Gemini for vision features alongside Ollama for text
Audio Processing
Voice features always use Gemini, even when Ollama is enabled for text chat.
Managing Models
List Available Models
Remove a Model
Update a Model
Check Model Info
Performance Optimization
Choose the Right Model Size
Choose the Right Model Size
- 2-7B parameters: Fast, lower memory (4-8GB RAM)
- 13B parameters: Slower, higher quality (16GB+ RAM)
- 70B+ parameters: Very slow, best quality (64GB+ RAM, GPU recommended)
GPU Acceleration
GPU Acceleration
Ollama automatically uses GPU if available:
- NVIDIA: CUDA support built-in
- AMD: ROCm support on Linux
- Apple Silicon: Metal acceleration on macOS
Memory Management
Memory Management
Ollama keeps models in memory for faster responses:
- Models stay loaded for ~5 minutes after last use
- Automatically unloaded when memory is needed
- Preload models: Send a test prompt on startup
Switching to Ollama
At Startup
Set environment variables in.env:
At Runtime
Switch from another provider to Ollama:Troubleshooting
Ollama Not Available
Ollama Not Available
Error:
Failed to connect to Ollama: Make sure Ollama is running on http://localhost:11434Solutions:- Check Ollama is running:
ollama list - Start Ollama:
ollama serve - Verify port 11434 is not blocked by firewall
- Check
OLLAMA_URLmatches your setup
No Models Found
No Models Found
Error:
[LLMHelper] No Ollama models foundSolutions:- Pull a model:
ollama pull llama3.2 - Verify with:
ollama list - Check Ollama data directory has sufficient space
Slow Responses
Slow Responses
Issue: Model takes too long to respondSolutions:
- Use a smaller model (e.g.,
gemmainstead ofcodellama) - Ensure sufficient RAM is available
- Enable GPU acceleration if available
- Close other memory-intensive applications
Model Selection Not Working
Model Selection Not Working
Issue: Configured model not being usedSolutions:
- Verify model is pulled:
ollama list - Check exact model name matches (case-sensitive)
- Let Cluely auto-detect: Remove
OLLAMA_MODELfrom.env - Restart Cluely after changing models
Best Practices
- Start with llama3.2: Best balance of speed, quality, and resource usage
- Preload on startup: Send a test prompt when app starts to keep model in memory
- Monitor memory: Use Activity Monitor/Task Manager to ensure sufficient RAM
- Keep models updated: Periodically
ollama pullto get model improvements - Use SSD storage: Store Ollama models on SSD for faster loading
Next Steps
Gemini Setup
Add Gemini for vision and audio features
Provider Comparison
Compare all available AI providers