Skip to main content

Overview

Adist integrates with Ollama to provide AI-driven code analysis using locally-run language models. This option is completely free, private, and doesn’t require an internet connection for inference.

Benefits

Free

No API costs - run unlimited queries

Private

Your code never leaves your machine

Offline

Works without internet (after initial setup)

Setup

1

Install Ollama

Download and install Ollama from ollama.com/download
Download the macOS installer from the website and run it.
2

Pull a Model

Download a language model. Popular options include:
3

Start Ollama Service

Ensure Ollama is running:
ollama serve
On most systems, Ollama runs as a background service automatically after installation.
4

Configure Adist

Run the LLM configuration command:
adist llm-config
Select:
  1. Ollama as your provider
  2. Your preferred model from the list of installed models
  3. Optionally customize the API URL (default: http://localhost:11434)
5

Verify Setup

Test the integration:
adist query "What does this project do?"

Features

Local Model Support

The Ollama service can use any locally installed model:
# List available models
ollama list

# Pull additional models
ollama pull llama3:70b  # Larger, more capable version
ollama pull phi3         # Smaller, faster model

Context Caching

The Ollama service includes intelligent context caching:
  • Topic Identification: Automatically identifies query topics
  • Cache Duration: Contexts are cached for 30 minutes
  • Cache Cleanup: Old entries are automatically removed
Context merging is simpler in Ollama compared to cloud providers due to smaller context windows.

Query Complexity Estimation

Queries are analyzed and categorized as:
  • Low Complexity: Simple questions (< 8 words, no technical terms)
  • Medium Complexity: Standard questions (8-15 words or basic technical terms)
  • High Complexity: Complex questions (> 15 words, code snippets, comparisons)
Context allocation is optimized based on complexity.

Streaming Support

Ollama supports real-time streaming responses:
adist query "Explain the authentication system" --stream
adist chat --stream

Code Reference

The Ollama service is implemented in /home/daytona/workspace/source/src/utils/ollama.ts:20

Key Methods

isAvailable

Checks if Ollama is running:
async isAvailable(): Promise<boolean>

listModels

Returns all locally installed models:
async listModels(): Promise<string[]>

summarizeFile

Generates summaries of individual files:
async summarizeFile(content: string, filePath: string): Promise<SummaryResult>

generateOverallSummary

Creates a project overview from file summaries:
async generateOverallSummary(fileSummaries: { path: string; summary: string }[]): Promise<SummaryResult>

queryProject

Answers questions about your project:
async queryProject(
  query: string,
  context: { content: string; path: string }[],
  projectId: string,
  streamCallback?: (chunk: string) => void
): Promise<SummaryResult>

chatWithProject

Enables conversational interactions:
async chatWithProject(
  messages: { role: 'user' | 'assistant'; content: string }[],
  context: { content: string; path: string }[],
  projectId: string,
  streamCallback?: (chunk: string) => void
): Promise<SummaryResult>

Configuration Options

Context Limits

  • Maximum Context Length: 30,000 characters (lower than cloud providers)
  • Cache Timeout: 30 minutes
  • Dynamic Adjustment: Context size varies based on query complexity

Custom API URL

If you’re running Ollama on a different host or port:
# During llm-config, specify custom URL
API URL: http://your-server:11434

Model Selection

Different models have different characteristics:
  • phi3: Fast, good for simple queries
  • llama3:8b: Balanced performance
  • mistral: General purpose
Best for: Quick answers, limited hardware

Performance Optimization

Hardware Requirements

  • RAM: 8GB
  • GPU: Optional (CPU-only works)
  • Storage: 5GB for small models
Suitable for: Basic queries with small models

GPU Acceleration

Ollama automatically uses GPU acceleration when available:
  • NVIDIA GPUs: CUDA support (recommended)
  • Apple Silicon: Metal support
  • AMD GPUs: ROCm support (Linux)
GPU acceleration can be 10-100x faster than CPU-only inference.

Cost Comparison

Ollama is completely free:
  • API Costs: $0 (no API calls)
  • Inference: Free unlimited usage
  • Storage: Only disk space for models
Example: 1000 queries
  • Ollama: $0
  • Anthropic (Claude Sonnet): ~$3-10
Ollama breaks even immediately.

Best Practices

  • Start with llama3 for balanced performance
  • Use codellama for code-heavy projects
  • Try smaller models first if hardware is limited
  • Experiment with different models for your use case

Troubleshooting

Ollama Not Running

If you see connection errors:
# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama service
ollama serve

No Models Available

If no models appear during configuration:
# List installed models
ollama list

# Pull a model if none are installed
ollama pull llama3

Slow Responses

  • Use a smaller model (e.g., llama3:8b instead of llama3:70b)
  • Enable GPU acceleration
  • Reduce context complexity
  • Close other applications

Out of Memory

  • Switch to a smaller model
  • Reduce the number of concurrent queries
  • Increase system swap space
  • Use CPU instead of GPU if VRAM is limited

Poor Response Quality

  • Try a larger or specialized model
  • Ensure project is properly indexed
  • Use more specific queries
  • Generate file summaries for better context

Privacy and Security

While Ollama runs locally, ensure you:
  • Keep Ollama updated for security patches
  • Don’t expose the Ollama API to untrusted networks
  • Use firewall rules if running on a server
Privacy Benefits:
  • Code never sent to external APIs
  • No data collection or telemetry
  • Complete control over your data
  • Suitable for sensitive or proprietary code

Advanced Configuration

Custom Model Parameters

You can customize model behavior by creating a Modelfile:
# Create a custom model with specific parameters
cat > Modelfile <<EOF
FROM llama3
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM You are a helpful coding assistant specialized in code review.
EOF

# Create the custom model
ollama create my-code-assistant -f Modelfile
Then select my-code-assistant in adist llm-config.

Running on Remote Server

To use Ollama running on another machine:
  1. Configure Ollama to accept remote connections
  2. Update the API URL in adist llm-config
  3. Ensure proper network security (VPN, firewall, etc.)

Next Steps

Start Querying

Ask questions about your codebase

Start Chatting

Have conversations about your project

Build docs developers (and LLMs) love