Skip to main content
Run open-source AI models locally on your machine using Ollama. No API keys, no cloud dependencies, complete privacy.

Prerequisites

  • Ollama installed: ollama.ai
  • Sufficient disk space for models (2-10GB per model)
  • Adequate RAM (8GB minimum, 16GB+ recommended)

Setup

1

Install Ollama

Download and install Ollama from ollama.ai.
brew install ollama
ollama serve
2

Pull a Model

Download a model from Ollama’s library:
# Fast, capable model
ollama pull llama3.2

# Larger, more capable
ollama pull llama3.1:70b

# Code-focused
ollama pull codellama

# Lightweight
ollama pull phi3
View all models at ollama.ai/library.
3

Verify Ollama is Running

Check that Ollama is running on localhost:11434:
curl http://localhost:11434/v1/models
You should see a JSON response with your installed models.
4

Open Glyph AI Settings

Go to Settings → AI and select the Ollama profile.
5

Configure Base URL

The default base URL is http://localhost:11434/v1. If Ollama runs on a different host or port, update the base URL.Allow Private Hosts is enabled by default for Ollama.
6

Select Model

Click the Model dropdown. Glyph fetches models from Ollama’s local API.Select your downloaded model (e.g., llama3.2, llama3.1:70b).
7

Test Connection

Open the AI panel and send a test message. You should receive a response from your local model.

Configuration

Provider Settings

  • Service: ollama
  • Base URL: http://localhost:11434/v1 (default)
  • Authentication: None (local API)
  • Allow Private Hosts: Enabled (required for localhost)

Custom Port

If Ollama runs on a different port:
Base URL: http://localhost:8080/v1

Remote Ollama Server

To connect to Ollama on another machine:
Base URL: http://192.168.1.100:11434/v1
Ensure Allow Private Hosts is enabled.

Model Selection

Glyph uses Ollama’s OpenAI-compatible /v1/models endpoint to list models.
ModelSizeUse CaseRAM Required
llama3.23BFast, everyday tasks8GB
llama3.18BGeneral purpose8GB
llama3.1:70b70BMost capable32GB+
mistral7BBalanced performance8GB
codellama7BCode generation8GB
phi33.8BLightweight4GB
gemma29BGoogle’s open model8GB
Explore all models at ollama.ai/library.

Model Tags

Ollama models use tags for variants:
  • llama3.1:latest - Latest stable version
  • llama3.1:70b - 70 billion parameter variant
  • llama3.1:8b-q4_0 - 4-bit quantized (smaller, faster)

Features

Chat Mode

Conversational interaction:
  • Back-and-forth dialogue
  • No file system access
  • Fast local inference
  • Best for brainstorming and Q&A

Create Mode

Local AI with workspace tools:
  • read_file - Read files from your space
  • search_notes - Search note content
  • list_dir - List directory contents
  • Tool usage tracked in timeline view
  • Best for research and knowledge retrieval

Context Attachment

Attach notes for grounded responses:
  • Attach files or folders via context menu
  • Mention with @filename syntax
  • Configure character budget (up to 250K chars)
  • Context sent locally, never leaves your machine

Performance

Inference Speed

Local inference speed depends on:
  • Model size: Smaller models (3B-8B) are faster
  • Hardware: GPU acceleration significantly improves speed
  • Context length: Longer contexts increase latency

GPU Acceleration

Ollama automatically uses GPU if available:
  • NVIDIA: CUDA support
  • AMD: ROCm support
  • Apple Silicon: Metal acceleration
Check GPU usage:
ollama ps

Context Window

Ollama models have varying context windows:
  • llama3.1: 128K tokens
  • mistral: 32K tokens
  • codellama: 16K tokens
Larger contexts increase memory usage and latency.

Privacy and Security

Ollama runs entirely on your machine:
  • ✅ No data sent to external servers
  • ✅ No API keys required
  • ✅ Complete privacy for sensitive notes
  • ✅ Works offline
  • ✅ No usage limits or billing
Ollama is ideal for private notes, confidential documents, or offline environments.

Troubleshooting

”model list failed”

Cause: Glyph can’t connect to Ollama. Solution:
  1. Verify Ollama is running: ollama ps
  2. Check the base URL in settings
  3. Ensure Allow Private Hosts is enabled
  4. Test connection: curl http://localhost:11434/v1/models

Model not in dropdown

Solution: Type the model name manually (e.g., llama3.2, codellama).

“connection refused”

Cause: Ollama is not running. Solution: Start Ollama:
ollama serve

Responses are very slow

Possible causes:
  • Large model (70B+) without sufficient RAM
  • No GPU acceleration
  • Long context
Solutions:
  • Use a smaller model (llama3.2, phi3)
  • Enable GPU acceleration (automatic if hardware supports it)
  • Reduce context size
  • Close other memory-intensive applications

”out of memory”

Cause: Model is too large for available RAM. Solution:
  • Use a smaller model
  • Use quantized variants (e.g., llama3.1:8b-q4_0)
  • Close other applications
  • Increase system swap space

Tool calls fail in create mode

Cause: Some Ollama models don’t support function calling well. Solution: Use chat mode instead, or try a different model. llama3.1 has good tool support.

Advanced Configuration

Custom Ollama Endpoint

If you run Ollama with custom settings:
OLLAMA_HOST=0.0.0.0:8080 ollama serve
Update base URL in Glyph:
Base URL: http://localhost:8080/v1

Model Parameters

To adjust model parameters (temperature, top_p, etc.), you’ll need to modify Glyph’s source code or use a different provider (OpenAI-compatible supports more options).

Multiple Ollama Instances

Run multiple Ollama instances on different ports and create separate profiles in Glyph for each.

Next Steps

Chat Modes

Learn about chat vs create modes

Context Management

Attach notes to local AI conversations

OpenAI-Compatible

Use other OpenAI-compatible endpoints

Profiles

Manage multiple AI profiles

Build docs developers (and LLMs) love