Skip to main content

Setting Up Ollama for Local AI

Ollama provides local AI inference without cloud dependencies. Asta uses Ollama for:
  • RAG embeddings with nomic-embed-text
  • Local chat with models like llama3, mistral, or qwen
  • Privacy-focused workflows that keep data on your machine
1

Install Ollama

Download and install Ollama from ollama.commacOS/Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows: Download the installer from ollama.com/downloadVerify installation:
ollama --version
2

Pull the RAG embedding model

Asta’s Learning skill requires nomic-embed-text for semantic search:
ollama pull nomic-embed-text
This model is lightweight (~274 MB) and optimized for embeddings.
3

Start Ollama server

Ollama runs as a background service. Start it with:
ollama serve
Or launch the Ollama app. The API runs at http://localhost:11434 by default.
On macOS, Ollama starts automatically when you pull a model or run ollama serve.
4

Pull a chat model (optional)

For local chat inference, pull a model:
# Lightweight (4GB)
ollama pull llama3.2:3b

# Balanced (7GB)
ollama pull mistral

# Advanced (16GB+)
ollama pull llama3.1:70b
List available models at ollama.com/library
5

Configure Asta to use Ollama

In Asta’s web panel or desktop app:
  1. Go to Settings → AI Providers
  2. Enable Ollama and set it as your default provider
  3. Select a chat model (e.g., llama3.2:3b)
  4. Save settings
Asta will automatically use Ollama for RAG if nomic-embed-text is available.

Quick Setup Script

Asta includes a setup script that installs Ollama (Linux/macOS) and pulls the RAG model:
cd ~/workspace/source
./scripts/setup_ollama_rag.sh -i
What it does:
  • Installs Ollama via curl | sh (Linux/macOS only)
  • Pulls nomic-embed-text
  • Verifies the installation
Source: ~/workspace/source/scripts/setup_ollama_rag.sh

Configuration

Environment Variables

Set these in .env or Settings:
# Ollama API endpoint
OLLAMA_BASE_URL=http://localhost:11434

# Model for RAG embeddings
ASTAMISTRAL_OLLAMA_EMBEDDING_MODEL=nomic-embed-text

# Default chat model
OLLAMA_MODEL=llama3.2:3b

Testing RAG

Once configured, test the Learning skill:
Learn about Python asyncio for 2 minutes
Asta will:
  1. Research the topic via web search
  2. Generate embeddings with nomic-embed-text
  3. Store knowledge in ChromaDB for retrieval

Troubleshooting

After installation, restart your terminal or add Ollama to PATH:
export PATH="$PATH:/usr/local/bin"
On macOS, Ollama installs to /usr/local/bin by default.
Ensure Ollama is running:
ollama serve
Or launch the Ollama app from your Applications folder.
Verify the model is pulled:
ollama list
If missing, pull it:
ollama pull nomic-embed-text
Check these steps:
  1. nomic-embed-text is pulled: ollama list
  2. Ollama is running: curl http://localhost:11434/api/tags
  3. Environment variable is set: echo $ASTAMISTRA L_OLLAMA_EMBEDDING_MODEL
Restart Asta backend after configuration changes.
macOS permission dialogs: When Asta first uses Ollama, macOS may prompt for network access. Allow the backend process (Python) to connect to localhost:11434.

Model Recommendations

ModelSizeUse Case
nomic-embed-text274 MBRAG embeddings (required)
llama3.2:3b4 GBFast chat, low memory
mistral7 GBBalanced quality/speed
qwen2.5:7b7 GBExcellent for coding
llama3.1:70b40 GBAdvanced reasoning (requires 64GB+ RAM)

Next Steps

Build docs developers (and LLMs) love