Skip to main content

Overview

The tinbox translate command translates documents using Large Language Models (LLMs). It supports multiple file formats (PDF, TXT, DOCX, MD), translation algorithms, and cloud or local models.

Basic Usage

tinbox translate <input-file> --model <provider:model> --to <target-lang>
tinbox translate document.pdf --model openai:gpt-4o --to es

Arguments

input-file
Path
required
The input file to translate. Must be an existing file in a supported format (PDF, TXT, DOCX, MD).Example: ./examples/elara_story.txt

Options

Core Translation Options

--model
string
required
Model specification in the format provider:model-name.Supported providers:
  • openai - OpenAI models (requires OPENAI_API_KEY)
  • anthropic - Anthropic Claude models (requires ANTHROPIC_API_KEY)
  • google - Google Gemini models (requires GOOGLE_API_KEY)
  • ollama - Local Ollama models (no API key required)
Examples:
  • openai:gpt-4o
  • anthropic:claude-3-sonnet
  • google:gemini-1.5-pro
  • ollama:mistral-small
Alias: -m
--to
string
default:"en"
Target language code (ISO 639-1 format).Examples: en, es, fr, de, zh, ja, koDefault: en (English)Alias: -t
--from
string
default:"auto"
Source language code (ISO 639-1 format). If not specified, the language is auto-detected.Examples: en, es, fr, de, zh, jaDefault: Auto-detectAlias: -f
--output
Path
The output file path. If not specified, prints the translation to stdout.Example: --output translated.txtAlias: -o
--format
enum
default:"text"
Output format for the translation result.Options:
  • text - Plain text output (default)
  • json - JSON format with metadata and statistics
  • markdown - Markdown formatted output
Default: textAlias: -F

Algorithm & Processing Options

--algorithm
string
default:"auto"
Translation algorithm to use. Auto-selects based on file type if not specified.Options:
  • page - Process document page-by-page (required for PDFs)
  • sliding-window - Use overlapping windows for context
  • context-aware - Smart chunking based on content structure
Auto-selection:
  • PDF files → page algorithm
  • Text files → context-aware algorithm
Note: PDF files only support the page algorithm.Alias: -a
--context-size
integer
default:"2000"
Target chunk size in characters for the context-aware algorithm.Default: 2000
--split-token
string
Custom token to split text on when using the context-aware algorithm.Example: --split-token "\n\n" (split on double newlines)
--pdf-dpi
integer
default:"200"
DPI (dots per inch) for PDF rasterization. Higher values produce better quality but consume more tokens and increase cost.PDF files only.Recommended values:
  • 150 - Low quality, faster, cheaper
  • 200 - Balanced (default)
  • 300 - High quality, slower, more expensive
Default: 200

Cost & Safety Options

--dry-run
boolean
default:"false"
Estimate cost and tokens without performing the actual translation.Shows:
  • Estimated tokens
  • Estimated cost (USD)
  • Estimated time
  • Cost level (low/medium/high)
  • Warnings
Default: false
--max-cost
float
Maximum cost threshold in USD. The translation will abort if the estimated cost exceeds this value.Example: --max-cost 5.00
--force
boolean
default:"false"
Skip warning confirmations and proceed with translation automatically.Default: false

Checkpoint & Resume Options

--checkpoint-dir
Path
Directory to store translation checkpoints. Enables resuming interrupted translations.Example: --checkpoint-dir ./checkpoints
--checkpoint-frequency
integer
default:"1"
Save checkpoint every N pages or chunks.Default: 1 (save after every page/chunk)

Glossary Options

--glossary
boolean
default:"false"
Enable glossary for consistent term translations across the document.Default: false
--glossary-file
Path
Path to an existing glossary file (JSON format) to load initial terms from.Example: --glossary-file technical-terms.json
--save-glossary
Path
Path to save the updated glossary after translation.Example: --save-glossary updated-terms.json

Model Configuration

--reasoning-effort
enum
default:"minimal"
Model reasoning effort level. Higher levels improve translation quality but significantly increase cost and processing time.Options:
  • minimal - Fastest, lowest cost (default)
  • low - Slight quality improvement
  • medium - Balanced quality and cost
  • high - Best quality, highest cost and time
Default: minimal

Output & Logging Options

--verbose
boolean
default:"false"
Show detailed progress information during translation.Default: false

Global Options

These options are available for all Tinbox commands and must be specified before the command name.
--log-level
string
default:"INFO"
Set the logging level.Options: DEBUG, INFO, WARNING, ERROR, CRITICALDefault: INFOAlias: -lExample: tinbox --log-level DEBUG translate document.pdf --model openai:gpt-4o --to es
--json
boolean
default:"false"
Output logs in JSON format.Default: falseAlias: -jExample: tinbox --json translate document.pdf --model openai:gpt-4o --to es
--version
boolean
Show version information and exit.Alias: -vExample: tinbox --version

Examples

Translate PDF to Spanish

tinbox translate document.pdf --model openai:gpt-4o --to es --output document_es.txt

Estimate Cost Before Translation

tinbox translate large-book.pdf --model anthropic:claude-3-sonnet --to fr --dry-run

Use Local Model (Ollama)

tinbox translate story.txt --model ollama:mistral-small --to de --output story_de.txt

Translation with Glossary

tinbox translate technical-manual.md \
  --model openai:gpt-4o \
  --to ja \
  --glossary \
  --glossary-file existing-terms.json \
  --save-glossary updated-terms.json

Resume Interrupted Translation

# First run (gets interrupted)
tinbox translate long-document.pdf \
  --model anthropic:claude-3-sonnet \
  --to zh \
  --checkpoint-dir ./checkpoints

# Resume from checkpoint
tinbox translate long-document.pdf \
  --model anthropic:claude-3-sonnet \
  --to zh \
  --checkpoint-dir ./checkpoints

High-Quality PDF Translation

tinbox translate scanned-document.pdf \
  --model openai:gpt-4o \
  --to es \
  --pdf-dpi 300 \
  --reasoning-effort medium

Set Maximum Cost Limit

tinbox translate document.pdf \
  --model openai:gpt-4o \
  --to fr \
  --max-cost 10.00

JSON Output with Metadata

tinbox translate report.txt \
  --model anthropic:claude-3-sonnet \
  --to de \
  --format json \
  --output report_de.json

Translation Workflow

  1. Language Validation: Source and target language codes are validated
  2. Cost Estimation: Token count and cost are estimated
  3. User Confirmation: Warnings are shown if applicable (skip with --force)
  4. Document Loading: Input file is loaded and processed
  5. Translation: Content is translated using the specified algorithm
  6. Progress Tracking: Real-time progress bar shows completion status
  7. Output Generation: Translated content is written to file or stdout
  8. Statistics Display: Final metrics (time, tokens, cost) are shown

Error Handling

  • Invalid Language Codes: Returns clear error messages for unsupported codes
  • Missing API Keys: Reports which environment variable is missing
  • File Type Errors: Validates file format compatibility
  • Algorithm Conflicts: Prevents using incompatible algorithms (e.g., non-page algorithms with PDFs)
  • Cost Limits: Aborts translation if estimated cost exceeds --max-cost
  • Failed Pages: Reports which pages failed with specific error messages

See Also

Build docs developers (and LLMs) love