Ollama Integration

Ollama enables SlasshyWispr to run large language models locally for completely offline AI assistance.

What is Ollama?

Ollama is a lightweight, extensible framework for running large language models on your local machine. It provides:

Local AI Inference: Run models like Llama, Mistral, and Gemma without internet
Simple API: Compatible with OpenAI API format
Model Management: Easy model pulling, updating, and version control
Cross-Platform: Works on macOS, Linux, and Windows
GPU Acceleration: Automatic CUDA and Metal support

SlasshyWispr communicates with Ollama via HTTP API calls, keeping your conversations completely private and offline.

Installation and Setup

Install Ollama

Download and install Ollama from ollama.aimacOS / Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Windows: Download the installer from the Ollama website

Start Ollama Service

Ollama runs as a background service after installation.macOS / Linux:

ollama serve

Windows: Ollama starts automatically after installation

The Ollama service must be running for SlasshyWispr to communicate with it.

Verify Installation

Check that Ollama is running:

ollama --version

Test the API endpoint:

curl http://127.0.0.1:11434/api/tags

Ollama Base URL Configuration

SlasshyWispr connects to Ollama via its HTTP API endpoint.

Default Configuration

The default Ollama base URL is:

http://127.0.0.1:11434

This is the standard local endpoint where Ollama serves its API.

Custom Base URL

You can configure a custom base URL if:

Ollama is running on a different port
Ollama is running on a remote machine
You’re using a reverse proxy

Open Settings

Navigate to Settings > Offline in SlasshyWispr

Update Base URL

Find the Ollama Base URL field and enter your custom URLExamples:

Different port: http://127.0.0.1:8080
Remote server: http://192.168.1.100:11434
HTTPS endpoint: https://ollama.example.com

Test Connection

SlasshyWispr will automatically verify the connection to your Ollama instance

Using a remote Ollama instance over the internet may expose your conversations to network monitoring. Use HTTPS and secure networking when connecting to remote instances.

Model Pulling

Before using a model with SlasshyWispr, you need to pull it from the Ollama library.

Using Ollama CLI

Browse Available Models

Visit ollama.ai/library to see available models

Pull a Model

Use the Ollama CLI to download a model:

ollama pull llama3.2

Other popular models:

ollama pull mistral
ollama pull gemma2
ollama pull qwen2.5

Wait for Download

Models range from 1GB to 40GB+ depending on size. Download time varies by internet speed.

Verify Model

List downloaded models:

ollama list

Using SlasshyWispr UI

Open Offline Settings

Navigate to Settings > Offline tab

Enter Model Name

In the Local Ollama Model field, type the model name (e.g., llama3.2)

Pull Model

Click Pull Model to download directly from SlasshyWisprProgress will be shown with:

Download status
Model being pulled
Success/failure indication

Models pulled via either method are stored in Ollama’s model directory and accessible to both Ollama CLI and SlasshyWispr.

Ollama Status Checking

SlasshyWispr continuously monitors your Ollama installation status.

Status Response Fields

The Ollama status check returns:

interface OllamaStatusResponse {
  installed: boolean;      // Whether Ollama is installed on the system
  running: boolean;        // Whether Ollama service is currently running
  version: string;         // Ollama version (e.g., "0.1.27")
  details: string;         // Additional status information or errors
}

Status Indicators

In SlasshyWispr Settings > Offline:

Green dot: Ollama installed and running
Yellow dot: Ollama installed but not running
Red dot: Ollama not detected or error

Common Status Issues

Ollama Not Installed

Status: installed: falseSolution: Install Ollama following the steps above

Ollama Not Running

Status: installed: true, running: falseSolution: Start the Ollama service:

ollama serve

Connection Refused

Status: Connection error in detailsSolution: Check that Ollama base URL is correct and firewall allows local connections

Compatible Models

SlasshyWispr works with any Ollama-compatible model. Here are recommended models by use case:

Conversational AI

Llama 3.2

Fast, efficient, excellent for general conversation

ollama pull llama3.2

Mistral

Balanced performance and quality

ollama pull mistral

Gemma 2

Google’s efficient language model

ollama pull gemma2

Qwen 2.5

Strong multilingual capabilities

ollama pull qwen2.5

Model Sizes

Most models come in multiple sizes (parameter counts):

Size	RAM Required	Performance
1B-3B	4-8 GB	Fast, basic tasks
7B-8B	8-16 GB	Balanced, good quality
13B-14B	16-32 GB	High quality
30B+	32+ GB	Best quality, slower

Pull specific sizes by appending the parameter count:

ollama pull llama3.2:1b
ollama pull llama3.2:3b
ollama pull mistral:7b

Specialized Models

Code Generation: codellama, deepseek-coder
Chat Optimized: Models with :chat or :instruct tags
Uncensored: Models with :uncensored tag for unrestricted responses

Testing Your Setup

Verify Ollama Status

In SlasshyWispr Settings > Offline, confirm Ollama shows as installed and running

Select Local Mode

In Settings > Models, set AI Runtime Mode to Local

Choose Model

Select your pulled model from the Local Ollama Model dropdown

Test Dictation

Use your push-to-talk hotkey and ask a questionExample: “What is the capital of France?”

Verify Response

You should receive a response generated entirely locally without internet

Performance Optimization

GPU Acceleration

Ollama automatically uses GPU when available:

NVIDIA GPUs: Requires CUDA toolkit
Apple Silicon: Uses Metal acceleration
AMD GPUs: ROCm support (Linux)

Check GPU usage during inference:

ollama ps

Context Window

Adjust context size for longer conversations:

ollama run llama3.2 --ctx-size 4096

Concurrent Requests

Ollama can handle multiple requests. Configure in Ollama settings:

export OLLAMA_NUM_PARALLEL=2

Troubleshooting

Model Pull Fails

Issue: Cannot download model Solutions:

Check internet connection
Verify sufficient disk space (models are large)
Try pulling via Ollama CLI directly
Check Ollama logs for errors

Slow Inference

Issue: AI responses take too long Solutions:

Use a smaller model (3B instead of 13B)
Enable GPU acceleration if available
Close other resource-intensive applications
Check hardware requirements

Connection Errors

Issue: SlasshyWispr cannot connect to Ollama Solutions:

Verify Ollama is running: ollama serve
Check base URL matches Ollama’s listening address
Test manually: curl http://127.0.0.1:11434/api/tags
Check firewall settings for localhost connections

Out of Memory

Issue: Model fails to load or crashes Solutions:

Use a smaller model variant
Close other applications to free RAM
Increase system swap space
See hardware requirements for RAM recommendations

Running models larger than your available RAM will cause severe performance degradation or crashes. Always choose models appropriate for your hardware.

Get Started

Core Features

Configuration

Local Models

Productivity Tools

Guides

What is Ollama?

Installation and Setup

Ollama Base URL Configuration

Default Configuration

Custom Base URL

Model Pulling

Using Ollama CLI

Using SlasshyWispr UI

Ollama Status Checking

Status Response Fields

Status Indicators

Common Status Issues

Compatible Models

Conversational AI

Llama 3.2

Mistral

Gemma 2

Qwen 2.5

Model Sizes

Specialized Models

Testing Your Setup

Performance Optimization

GPU Acceleration

Context Window

Concurrent Requests

Troubleshooting

Model Pull Fails

Slow Inference

Connection Errors

Out of Memory

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Local Models

Productivity Tools

Guides

​What is Ollama?

​Installation and Setup

​Ollama Base URL Configuration

​Default Configuration

​Custom Base URL

​Model Pulling

​Using Ollama CLI

​Using SlasshyWispr UI

​Ollama Status Checking

​Status Response Fields

​Status Indicators

​Common Status Issues

​Compatible Models

​Conversational AI

Llama 3.2

Mistral

Gemma 2

Qwen 2.5

​Model Sizes

​Specialized Models

​Testing Your Setup

​Performance Optimization

​GPU Acceleration

​Context Window

​Concurrent Requests

​Troubleshooting

​Model Pull Fails

​Slow Inference

​Connection Errors

​Out of Memory

Build docs developers (and LLMs) love

What is Ollama?

Installation and Setup

Ollama Base URL Configuration

Default Configuration

Custom Base URL

Model Pulling

Using Ollama CLI

Using SlasshyWispr UI

Ollama Status Checking

Status Response Fields

Status Indicators

Common Status Issues

Compatible Models

Conversational AI

Model Sizes

Specialized Models

Testing Your Setup

Performance Optimization

GPU Acceleration

Context Window

Concurrent Requests

Troubleshooting

Model Pull Fails

Slow Inference

Connection Errors

Out of Memory