Skip to main content
Run translations completely locally using Ollama - no API keys, no costs, no rate limits. Perfect for privacy-sensitive documents, development, or unlimited translations.

Why Use Local Models?

Advantages:
  • Zero Cost - Unlimited translations without API fees
  • Privacy - Documents never leave your machine
  • No Rate Limits - Translate as much as you want
  • Offline Capable - Work without internet connection
  • Fast Iteration - No network latency for small documents
Tradeoffs:
  • Lower quality compared to GPT-5 or Claude 4
  • Slower on CPU-only machines
  • Cannot process PDFs (no vision capability)
  • Requires local compute resources
Local models work best for text files. For PDF translation, you’ll need cloud models with vision capabilities.

Setup

1

Install Ollama

Download and install Ollama from ollama.ai:
macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh
Windows
# Download from https://ollama.ai/download
2

Pull a Model

Download a translation-capable model:
# Recommended: Llama 3.1 (8B) - good balance
ollama pull llama3.1:8b

# Larger model for better quality
ollama pull llama3.1:70b

# Smaller model for speed
ollama pull llama3.1:3b

# Alternative: Mistral
ollama pull mistral-small
3

Start Ollama Server

ollama serve
Keep this running in a separate terminal.
4

Translate with Tinbox

tinbox translate --to es --model ollama:llama3.1:8b document.txt

Basic Usage

Simple Translation

# Start Ollama (in separate terminal)
ollama serve

# Translate with local model
tinbox translate --to de --model ollama:llama3.1:8b ./examples/story.txt

Specify Output File

tinbox translate --to fr \
  --model ollama:llama3.1:8b \
  --output document_fr.txt \
  document.txt

Choosing a Model

Model Comparison

ModelSizeSpeedQualityBest For
llama3.1:3b3B paramsFastBasicQuick drafts, simple text
llama3.1:8b8B paramsMediumGoodGeneral use, balanced
llama3.1:70b70B paramsSlowBestHigh quality, complex docs
mistral-small7B paramsMediumGoodAlternative to Llama
qwen2.5:32b32B paramsMedium-SlowVery GoodTechnical documents
Start with llama3.1:8b for the best balance of speed and quality.

Model Selection Examples

# Best for: Quick translations, simple content
ollama pull llama3.1:3b
tinbox translate --to es --model ollama:llama3.1:3b document.txt

Performance Optimization

Hardware Considerations

GPU Acceleration (Recommended)
  • NVIDIA GPU: CUDA support built-in
  • Apple Silicon: Metal acceleration automatic
  • AMD GPU: ROCm support on Linux
CPU-Only
  • Works but slower
  • Smaller models (3b-8b) are more practical
  • Expect 2-10x slower than GPU

Improving Speed

# Use smaller model for faster processing
tinbox translate --to es --model ollama:llama3.1:3b large_document.txt

# Reduce context size for faster chunks
tinbox translate --to es \
  --context-size 1000 \
  --model ollama:llama3.1:8b \
  large_document.txt
Larger models (70b+) require significant RAM/VRAM:
  • 70B model needs ~40GB+ RAM
  • 8B model needs ~8GB RAM
  • 3B model needs ~4GB RAM

Limitations

No PDF Support

Local models in Ollama don’t have vision capabilities and cannot process PDFs:
# This will FAIL
tinbox translate --to es --model ollama:llama3.1:8b document.pdf

# Error: PDF files require vision-capable models
Solution: Use cloud models for PDFs:
tinbox translate --to es --model openai:gpt-4o document.pdf

Quality Differences

Local models produce lower quality translations than GPT-5 or Claude 4:
  • Less nuanced understanding
  • More literal translations
  • Weaker handling of idioms and context
  • May miss subtle meanings
For critical documents, consider:
  1. Translate with local model (free/fast)
  2. Review and identify issues
  3. Re-translate specific sections with cloud models

Use Cases

Development & Testing

# Test translation pipelines without API costs
tinbox translate --to es --model ollama:llama3.1:8b test_document.txt

# Iterate on chunking strategies
tinbox translate --to fr \
  --context-size 1000 \
  --model ollama:llama3.1:8b \
  test.txt

tinbox translate --to fr \
  --context-size 2000 \
  --model ollama:llama3.1:8b \
  test.txt

Large-Scale Translations

# Translate many documents without cost concerns
for file in docs/*.txt; do
  tinbox translate --to es \
    --model ollama:llama3.1:8b \
    --output "translated/$(basename "$file")" \
    "$file"
done

Privacy-Sensitive Documents

# Keep confidential documents local
tinbox translate --to de \
  --model ollama:llama3.1:8b \
  confidential_report.txt

Draft Translations

# Get quick drafts locally, refine with cloud models later
tinbox translate --to fr \
  --model ollama:llama3.1:8b \
  --output draft_fr.txt \
  document.txt

# Review draft, then use cloud model for final version
tinbox translate --to fr \
  --model openai:gpt-5-2025-08-07 \
  --output final_fr.txt \
  document.txt

Combining with Other Features

Local Models + Checkpoints

# Free unlimited translations with resume capability
tinbox translate --to es \
  --checkpoint-dir ./checkpoints \
  --model ollama:llama3.1:8b \
  huge_document.txt

Local Models + Glossaries

# Build glossary locally for free
tinbox translate --to de \
  --glossary \
  --save-glossary terms.json \
  --model ollama:llama3.1:8b \
  technical_doc.txt

Hybrid Workflow

# 1. Build glossary with local model (free)
tinbox translate --to es \
  --glossary \
  --save-glossary terms.json \
  --model ollama:llama3.1:8b \
  sample_doc.txt

# 2. Use glossary with cloud model for quality
tinbox translate --to es \
  --glossary-file terms.json \
  --model openai:gpt-5-2025-08-07 \
  full_document.txt

Troubleshooting

”Connection refused” Error

# Make sure Ollama is running
ollama serve

# In another terminal
tinbox translate --to es --model ollama:llama3.1:8b document.txt

”Model not found” Error

# Pull the model first
ollama pull llama3.1:8b

# Then translate
tinbox translate --to es --model ollama:llama3.1:8b document.txt

Slow Performance

# Use a smaller model
ollama pull llama3.1:3b
tinbox translate --to es --model ollama:llama3.1:3b document.txt

# Or reduce context size
tinbox translate --to es \
  --context-size 1000 \
  --model ollama:llama3.1:8b \
  document.txt

Out of Memory

# Switch to smaller model
ollama pull llama3.1:3b

# Or close other applications to free RAM

Poor Translation Quality

# Try a larger model
ollama pull llama3.1:70b
tinbox translate --to es --model ollama:llama3.1:70b document.txt

# Or use cloud model for better quality
tinbox translate --to es --model openai:gpt-5-2025-08-07 document.txt

Checking Ollama Setup

Use the doctor command to verify Ollama installation:
tinbox doctor
This checks:
  • Ollama installation
  • Ollama server status
  • Available models

Best Practices

Start Small

Begin with llama3.1:8b before trying larger models

Use for Development

Perfect for testing without API costs

Privacy First

Ideal for confidential documents

Hybrid Approach

Combine local and cloud models strategically
ScenarioRecommended Approach
Text filesLocal models work great
PDF filesMust use cloud models (vision required)
DevelopmentLocal models for testing
ProductionCloud models for quality
ConfidentialLocal models for privacy
Large scaleLocal models to control costs

Next Steps

Large Documents

Translate large files with local models

Using Glossaries

Build glossaries for free with local models

Checkpoints & Resume

Enable resumable local translations

CLI Reference

Complete command-line reference

Build docs developers (and LLMs) love