Running Local Models with Ollama

Run translations completely locally using Ollama - no API keys, no costs, no rate limits. Perfect for privacy-sensitive documents, development, or unlimited translations.

Why Use Local Models?

Advantages:

Zero Cost - Unlimited translations without API fees
Privacy - Documents never leave your machine
No Rate Limits - Translate as much as you want
Offline Capable - Work without internet connection
Fast Iteration - No network latency for small documents

Tradeoffs:

Lower quality compared to GPT-5 or Claude 4
Slower on CPU-only machines
Cannot process PDFs (no vision capability)
Requires local compute resources

Local models work best for text files. For PDF translation, you’ll need cloud models with vision capabilities.

Setup

Install Ollama

Download and install Ollama from ollama.ai:

macOS/Linux

curl -fsSL https://ollama.ai/install.sh | sh

Windows

# Download from https://ollama.ai/download

Pull a Model

Download a translation-capable model:

# Recommended: Llama 3.1 (8B) - good balance
ollama pull llama3.1:8b

# Larger model for better quality
ollama pull llama3.1:70b

# Smaller model for speed
ollama pull llama3.1:3b

# Alternative: Mistral
ollama pull mistral-small

Start Ollama Server

ollama serve

Keep this running in a separate terminal.

Translate with Tinbox

tinbox translate --to es --model ollama:llama3.1:8b document.txt

Basic Usage

Simple Translation

# Start Ollama (in separate terminal)
ollama serve

# Translate with local model
tinbox translate --to de --model ollama:llama3.1:8b ./examples/story.txt

Specify Output File

tinbox translate --to fr \
  --model ollama:llama3.1:8b \
  --output document_fr.txt \
  document.txt

Choosing a Model

Model Comparison

Model	Size	Speed	Quality	Best For
llama3.1:3b	3B params	Fast	Basic	Quick drafts, simple text
llama3.1:8b	8B params	Medium	Good	General use, balanced
llama3.1:70b	70B params	Slow	Best	High quality, complex docs
mistral-small	7B params	Medium	Good	Alternative to Llama
qwen2.5:32b	32B params	Medium-Slow	Very Good	Technical documents

Start with llama3.1:8b for the best balance of speed and quality.

Model Selection Examples

# Best for: Quick translations, simple content
ollama pull llama3.1:3b
tinbox translate --to es --model ollama:llama3.1:3b document.txt

Performance Optimization

Hardware Considerations

GPU Acceleration (Recommended)

NVIDIA GPU: CUDA support built-in
Apple Silicon: Metal acceleration automatic
AMD GPU: ROCm support on Linux

CPU-Only

Works but slower
Smaller models (3b-8b) are more practical
Expect 2-10x slower than GPU

Improving Speed

# Use smaller model for faster processing
tinbox translate --to es --model ollama:llama3.1:3b large_document.txt

# Reduce context size for faster chunks
tinbox translate --to es \
  --context-size 1000 \
  --model ollama:llama3.1:8b \
  large_document.txt

Larger models (70b+) require significant RAM/VRAM:

70B model needs ~40GB+ RAM
8B model needs ~8GB RAM
3B model needs ~4GB RAM

Limitations

No PDF Support

Local models in Ollama don’t have vision capabilities and cannot process PDFs:

# This will FAIL
tinbox translate --to es --model ollama:llama3.1:8b document.pdf

# Error: PDF files require vision-capable models

Solution: Use cloud models for PDFs:

tinbox translate --to es --model openai:gpt-4o document.pdf

Quality Differences

Local models produce lower quality translations than GPT-5 or Claude 4:

Less nuanced understanding
More literal translations
Weaker handling of idioms and context
May miss subtle meanings

For critical documents, consider:

Translate with local model (free/fast)
Review and identify issues
Re-translate specific sections with cloud models

Use Cases

Development & Testing

# Test translation pipelines without API costs
tinbox translate --to es --model ollama:llama3.1:8b test_document.txt

# Iterate on chunking strategies
tinbox translate --to fr \
  --context-size 1000 \
  --model ollama:llama3.1:8b \
  test.txt

tinbox translate --to fr \
  --context-size 2000 \
  --model ollama:llama3.1:8b \
  test.txt

Large-Scale Translations

# Translate many documents without cost concerns
for file in docs/*.txt; do
  tinbox translate --to es \
    --model ollama:llama3.1:8b \
    --output "translated/$(basename "$file")" \
    "$file"
done

Privacy-Sensitive Documents

# Keep confidential documents local
tinbox translate --to de \
  --model ollama:llama3.1:8b \
  confidential_report.txt

Draft Translations

# Get quick drafts locally, refine with cloud models later
tinbox translate --to fr \
  --model ollama:llama3.1:8b \
  --output draft_fr.txt \
  document.txt

# Review draft, then use cloud model for final version
tinbox translate --to fr \
  --model openai:gpt-5-2025-08-07 \
  --output final_fr.txt \
  document.txt

Combining with Other Features

Local Models + Checkpoints

# Free unlimited translations with resume capability
tinbox translate --to es \
  --checkpoint-dir ./checkpoints \
  --model ollama:llama3.1:8b \
  huge_document.txt

Local Models + Glossaries

# Build glossary locally for free
tinbox translate --to de \
  --glossary \
  --save-glossary terms.json \
  --model ollama:llama3.1:8b \
  technical_doc.txt

Hybrid Workflow

# 1. Build glossary with local model (free)
tinbox translate --to es \
  --glossary \
  --save-glossary terms.json \
  --model ollama:llama3.1:8b \
  sample_doc.txt

# 2. Use glossary with cloud model for quality
tinbox translate --to es \
  --glossary-file terms.json \
  --model openai:gpt-5-2025-08-07 \
  full_document.txt

Troubleshooting

”Connection refused” Error

# Make sure Ollama is running
ollama serve

# In another terminal
tinbox translate --to es --model ollama:llama3.1:8b document.txt

”Model not found” Error

# Pull the model first
ollama pull llama3.1:8b

# Then translate
tinbox translate --to es --model ollama:llama3.1:8b document.txt

Slow Performance

# Use a smaller model
ollama pull llama3.1:3b
tinbox translate --to es --model ollama:llama3.1:3b document.txt

# Or reduce context size
tinbox translate --to es \
  --context-size 1000 \
  --model ollama:llama3.1:8b \
  document.txt

Out of Memory

# Switch to smaller model
ollama pull llama3.1:3b

# Or close other applications to free RAM

Poor Translation Quality

# Try a larger model
ollama pull llama3.1:70b
tinbox translate --to es --model ollama:llama3.1:70b document.txt

# Or use cloud model for better quality
tinbox translate --to es --model openai:gpt-5-2025-08-07 document.txt

Checking Ollama Setup

Use the doctor command to verify Ollama installation:

tinbox doctor

This checks:

Ollama installation
Ollama server status
Available models

Best Practices

Start Small

Begin with llama3.1:8b before trying larger models

Use for Development

Perfect for testing without API costs

Privacy First

Ideal for confidential documents

Hybrid Approach

Combine local and cloud models strategically

Scenario	Recommended Approach
Text files	Local models work great
PDF files	Must use cloud models (vision required)
Development	Local models for testing
Production	Cloud models for quality
Confidential	Local models for privacy
Large scale	Local models to control costs

Next Steps

Large Documents

Translate large files with local models

Using Glossaries

Build glossaries for free with local models

Checkpoints & Resume

Enable resumable local translations

CLI Reference

Complete command-line reference

Get Started

Core Concepts

Guides

Advanced

​Why Use Local Models?

​Setup

​Basic Usage

​Simple Translation

​Specify Output File

​Choosing a Model

​Model Comparison

​Model Selection Examples

​Performance Optimization

​Hardware Considerations

​Improving Speed

​Limitations

​No PDF Support

​Quality Differences

​Use Cases

​Development & Testing

​Large-Scale Translations

​Privacy-Sensitive Documents

​Draft Translations

​Combining with Other Features

​Local Models + Checkpoints

​Local Models + Glossaries

​Hybrid Workflow

​Troubleshooting

​”Connection refused” Error

​”Model not found” Error

​Slow Performance

​Out of Memory

​Poor Translation Quality

​Checking Ollama Setup

​Best Practices

Start Small

Use for Development

Privacy First

Hybrid Approach

​Next Steps

Large Documents

Using Glossaries

Checkpoints & Resume

CLI Reference

Build docs developers (and LLMs) love

Why Use Local Models?

Setup

Basic Usage

Simple Translation

Specify Output File

Choosing a Model

Model Comparison

Model Selection Examples

Performance Optimization

Hardware Considerations

Improving Speed

Limitations

No PDF Support

Quality Differences

Use Cases

Development & Testing

Large-Scale Translations

Privacy-Sensitive Documents

Draft Translations

Combining with Other Features

Local Models + Checkpoints

Local Models + Glossaries

Hybrid Workflow

Troubleshooting

”Connection refused” Error

”Model not found” Error

Slow Performance

Out of Memory

Poor Translation Quality

Checking Ollama Setup

Best Practices

Next Steps