What is Ollama?
Ollama is a lightweight, extensible framework for running large language models on your local machine. It provides:- Local AI Inference: Run models like Llama, Mistral, and Gemma without internet
- Simple API: Compatible with OpenAI API format
- Model Management: Easy model pulling, updating, and version control
- Cross-Platform: Works on macOS, Linux, and Windows
- GPU Acceleration: Automatic CUDA and Metal support
SlasshyWispr communicates with Ollama via HTTP API calls, keeping your conversations completely private and offline.
Installation and Setup
Install Ollama
Download and install Ollama from ollama.aimacOS / Linux:Windows:
Download the installer from the Ollama website
Start Ollama Service
Ollama runs as a background service after installation.macOS / Linux:Windows:
Ollama starts automatically after installation
The Ollama service must be running for SlasshyWispr to communicate with it.
Ollama Base URL Configuration
SlasshyWispr connects to Ollama via its HTTP API endpoint.Default Configuration
The default Ollama base URL is:Custom Base URL
You can configure a custom base URL if:- Ollama is running on a different port
- Ollama is running on a remote machine
- You’re using a reverse proxy
Update Base URL
Find the Ollama Base URL field and enter your custom URLExamples:
- Different port:
http://127.0.0.1:8080 - Remote server:
http://192.168.1.100:11434 - HTTPS endpoint:
https://ollama.example.com
Model Pulling
Before using a model with SlasshyWispr, you need to pull it from the Ollama library.Using Ollama CLI
Browse Available Models
Visit ollama.ai/library to see available models
Wait for Download
Models range from 1GB to 40GB+ depending on size. Download time varies by internet speed.
Using SlasshyWispr UI
Models pulled via either method are stored in Ollama’s model directory and accessible to both Ollama CLI and SlasshyWispr.
Ollama Status Checking
SlasshyWispr continuously monitors your Ollama installation status.Status Response Fields
The Ollama status check returns:Status Indicators
In SlasshyWispr Settings > Offline:- Green dot: Ollama installed and running
- Yellow dot: Ollama installed but not running
- Red dot: Ollama not detected or error
Common Status Issues
Compatible Models
SlasshyWispr works with any Ollama-compatible model. Here are recommended models by use case:Conversational AI
Llama 3.2
Fast, efficient, excellent for general conversation
Mistral
Balanced performance and quality
Gemma 2
Google’s efficient language model
Qwen 2.5
Strong multilingual capabilities
Model Sizes
Most models come in multiple sizes (parameter counts):| Size | RAM Required | Performance |
|---|---|---|
| 1B-3B | 4-8 GB | Fast, basic tasks |
| 7B-8B | 8-16 GB | Balanced, good quality |
| 13B-14B | 16-32 GB | High quality |
| 30B+ | 32+ GB | Best quality, slower |
Pull specific sizes by appending the parameter count:
Specialized Models
- Code Generation:
codellama,deepseek-coder - Chat Optimized: Models with
:chator:instructtags - Uncensored: Models with
:uncensoredtag for unrestricted responses
Testing Your Setup
Verify Ollama Status
In SlasshyWispr Settings > Offline, confirm Ollama shows as installed and running
Test Dictation
Use your push-to-talk hotkey and ask a questionExample: “What is the capital of France?”
Performance Optimization
GPU Acceleration
Ollama automatically uses GPU when available:- NVIDIA GPUs: Requires CUDA toolkit
- Apple Silicon: Uses Metal acceleration
- AMD GPUs: ROCm support (Linux)
Check GPU usage during inference:
Context Window
Adjust context size for longer conversations:Concurrent Requests
Ollama can handle multiple requests. Configure in Ollama settings:Troubleshooting
Model Pull Fails
Issue: Cannot download model Solutions:- Check internet connection
- Verify sufficient disk space (models are large)
- Try pulling via Ollama CLI directly
- Check Ollama logs for errors
Slow Inference
Issue: AI responses take too long Solutions:- Use a smaller model (3B instead of 13B)
- Enable GPU acceleration if available
- Close other resource-intensive applications
- Check hardware requirements
Connection Errors
Issue: SlasshyWispr cannot connect to Ollama Solutions:- Verify Ollama is running:
ollama serve - Check base URL matches Ollama’s listening address
- Test manually:
curl http://127.0.0.1:11434/api/tags - Check firewall settings for localhost connections
Out of Memory
Issue: Model fails to load or crashes Solutions:- Use a smaller model variant
- Close other applications to free RAM
- Increase system swap space
- See hardware requirements for RAM recommendations