Setting Up Ollama

Ollama is the local LLM runtime that powers Quest’s AI responses. This guide will walk you through installing, configuring, and troubleshooting Ollama.

Installation

macOS
Linux
Windows

Using Homebrew

The easiest way to install Ollama on macOS is using Homebrew:

brew install ollama

Manual Installation

Alternatively, download the installer from the official website:

Visit ollama.ai
Download the macOS installer
Open the downloaded .dmg file
Drag Ollama to your Applications folder

Using the Install Script

Run the official installation script:

curl -fsSL https://ollama.ai/install.sh | sh

Manual Installation

For manual installation:

# Download the binary
curl -L https://ollama.ai/download/ollama-linux-amd64 -o ollama

# Make it executable
chmod +x ollama

# Move to system path
sudo mv ollama /usr/local/bin/

Direct Download

Visit ollama.ai
Download the Windows installer
Run the installer and follow the prompts
Ollama will be added to your system PATH

Using WSL2

Alternatively, you can use Ollama in WSL2:

curl -fsSL https://ollama.ai/install.sh | sh

Starting Ollama

After installation, start the Ollama service:

Start the Ollama service

ollama serve

This starts Ollama on http://localhost:11434 (the default port).

The Ollama service must be running for Quest to work. Keep this terminal window open.

Verify the service is running

In a new terminal, check if Ollama is accessible:

curl http://localhost:11434/api/version

You should see a JSON response with version information.

Pulling Required Models

Quest uses two models depending on the mode:

Pull the general model

For general queries, Quest uses qwen2.5-coder:1.5b:

ollama pull qwen2.5-coder:1.5b

This model is configured in rag_engine3.py:23:

model_name: str = "qwen2.5-coder:1.5b",  # Default model

Pull the reasoning model

For reasoning mode, Quest uses deepseek-r1:7b (or the 1.5b variant):

ollama pull deepseek-r1:7b

Or for faster performance on lower-spec machines:

ollama pull deepseek-r1:1.5b

This model is configured in rag_engine3.py:24:

reasoning_model: str = "deepseek-r1:7b",  # Reasoning model

The 1.5b variant requires less memory (~2GB) but the 7b variant provides better reasoning quality (~6GB).

Verify models are installed

List all downloaded models:

ollama list

You should see both models in the output:

NAME                     ID              SIZE      MODIFIED
qwen2.5-coder:1.5b      abc123def456    900 MB    2 minutes ago
deepseek-r1:7b          def789ghi012    6.0 GB    1 minute ago

Configuring Ollama for Quest

Quest connects to Ollama via its REST API. The default configuration in rag_engine3.py is:

rag_engine3.py

class RAGEngine:
    def __init__(
        self,
        retriever: LeetCodeRetriever,
        ollama_url: str = "http://localhost:11434/api/generate",
        model_name: str = "qwen2.5-coder:1.5b",
        reasoning_model: str = "deepseek-r1:7b",
        temperature: float = 0.4,
        top_p: float = 0.9,
        repeat_penalty: float = 1.1,
        num_thread: int = 8
    ):

Customizing Model Parameters

You can adjust these parameters when initializing the RAG engine:

rag_engine = RAGEngine(
    retriever,
    temperature=0.2,  # More deterministic responses
    top_p=0.85
)

Verifying Installation

Test Ollama API directly

Test the API with a simple prompt:

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:1.5b",
  "prompt": "Explain binary search in one sentence.",
  "stream": false
}'

Test with Quest

Start the Flask application:

python app.py

Visit http://localhost:5000 and try a test query like “Two Sum problem”.

Troubleshooting

Ollama Service Not Running

Error: Connection refused or Failed to connect to Ollama APISolution: Ensure Ollama is running:

ollama serve

Port Already in Use

Error: Error: listen tcp 127.0.0.1:11434: bind: address already in useSolution: Kill the existing process:

# Find the process
lsof -i :11434

# Kill it (replace PID with actual process ID)
kill -9 <PID>

Model Not Found

Error: Error: model 'qwen2.5-coder:1.5b' not foundSolution: Pull the model:

ollama pull qwen2.5-coder:1.5b

Out of Memory

Error: Model loading fails or system becomes unresponsiveSolution: Use smaller models:

rag_engine = RAGEngine(
    retriever,
    model_name="qwen2.5-coder:1.5b",
    reasoning_model="deepseek-r1:1.5b"  # Use 1.5b instead of 7b
)

Slow Response Times

If responses are slow, try:

Increase thread count:

rag_engine = RAGEngine(retriever, num_thread=16)

Use GPU acceleration (if available):

# Ollama automatically uses GPU if available
# Verify with:
nvidia-smi

Reduce context window:

rag_engine = RAGEngine(retriever, max_history=1)  # Less history

Advanced Configuration

Using a Custom Ollama URL

If Ollama is running on a different host or port:

app.py

rag_engine = RAGEngine(
    retriever,
    ollama_url="http://192.168.1.100:11434/api/generate"
)

Running Ollama as a Service

Linux (systemd)
macOS (launchd)

Create a systemd service file:

sudo nano /etc/systemd/system/ollama.service

Add:

[Unit]
Description=Ollama Service
After=network.target

[Service]
ExecStart=/usr/local/bin/ollama serve
Restart=always
User=your-username

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable ollama
sudo systemctl start ollama

Create a launch agent:

nano ~/Library/LaunchAgents/com.ollama.service.plist

Add:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.ollama.service</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/ollama</string>
        <string>serve</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
</dict>
</plist>

Load:

launchctl load ~/Library/LaunchAgents/com.ollama.service.plist

Get Started

Core Concepts

Guides

Configuration

Setting Up Ollama

Installation

Using Homebrew

Manual Installation

Using the Install Script

Manual Installation

Direct Download

Using WSL2

Starting Ollama

Pulling Required Models

Configuring Ollama for Quest

Customizing Model Parameters

Verifying Installation

Troubleshooting

Ollama Service Not Running

Port Already in Use

Model Not Found

Out of Memory

Slow Response Times

Advanced Configuration

Using a Custom Ollama URL

Running Ollama as a Service

Next Steps

Using the Web Interface

Query Optimization

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

​Installation

​Using Homebrew

​Manual Installation

​Using the Install Script

​Manual Installation

​Direct Download

​Using WSL2

​Starting Ollama

​Pulling Required Models

​Configuring Ollama for Quest

​Customizing Model Parameters

​Verifying Installation

​Troubleshooting

​Ollama Service Not Running

​Port Already in Use

​Model Not Found

​Out of Memory

​Slow Response Times

​Advanced Configuration

​Using a Custom Ollama URL

​Running Ollama as a Service

​Next Steps

Using the Web Interface

Query Optimization

Build docs developers (and LLMs) love

Installation

Using Homebrew

Manual Installation

Using the Install Script

Manual Installation

Direct Download

Using WSL2

Starting Ollama

Pulling Required Models

Configuring Ollama for Quest

Customizing Model Parameters

Verifying Installation

Troubleshooting

Ollama Service Not Running

Port Already in Use

Model Not Found

Out of Memory

Slow Response Times

Advanced Configuration

Using a Custom Ollama URL

Running Ollama as a Service

Next Steps