Algorithm Overview
Page
Translates documents page by page independently. Fast and cost-effective.
Sliding Window
Uses overlapping windows for consistent terminology. Best for continuous text.
Context-Aware
Maintains context across chunks with smart splitting. Highest quality output.
Page-by-Page Algorithm
The page-by-page algorithm translates each page independently without maintaining context between pages. This is the fastest and most cost-effective approach.How It Works
- Each page is translated as a separate, independent request
- No context from previous pages is shared
- Failed pages are tracked and marked in the output
- Supports checkpoint/resume for long documents
Best For
- PDF documents with distinct pages
- Documents where pages are self-contained
- When speed and cost are priorities
- Large documents where context isn’t critical
Code Example
Fromalgorithms.py:147-354:
CLI Usage
Failed pages are marked with
[TRANSLATION_FAILED] placeholders in the output, making failures visible while preserving the document structure.Sliding Window Algorithm
The sliding window algorithm combines all pages into a single text, then creates overlapping windows for translation. This ensures consistent terminology across window boundaries.How It Works
- All pages are joined into a single continuous text
- Text is split into overlapping windows of configurable size
- Each window is translated with a specified overlap
- Translated windows are merged by detecting and removing duplicate overlap regions
Configuration Options
--window-size: Size of each window in characters (default: 2000)--overlap-size: Overlap between windows in characters (default: 200)
Best For
- Continuous text documents (novels, articles, essays)
- DOCX and TXT files without page breaks
- When consistent terminology is important
- Documents with flowing narrative
Code Example
Fromalgorithms.py:520-611:
CLI Usage
Context-Aware Algorithm
The context-aware algorithm provides the highest quality translations by maintaining context from previous chunks and using smart text splitting at natural boundaries.How It Works
- Text is split at natural boundaries (paragraphs, sentences, clauses)
- Each chunk is translated with context from the previous chunk
- Context includes both the original text and its translation
- The next chunk preview is provided for better flow
- Translated chunks are directly concatenated (no merging needed)
Smart Text Splitting
The algorithm splits text at natural boundaries in priority order:- Custom split token (if provided) - ignores target size
- Paragraph breaks (
\n\n) - Sentence endings (
.!?followed by space) - Line breaks (
\n) - Clause boundaries (
;:,followed by space) - Word boundaries (whitespace)
- Hard split at target size (fallback)
algorithms.py:614-717:
Context Information
Fromalgorithms.py:720-759:
Configuration Options
--context-size: Target chunk size in characters (default: 2000)--custom-split-token: Custom token to split on (ignores context-size)
Best For
- High-quality literary translations
- Technical documentation requiring consistent terminology
- Documents with complex narrative structure
- When translation quality is the top priority
CLI Usage
Algorithm Comparison
| Feature | Page-by-Page | Sliding Window | Context-Aware |
|---|---|---|---|
| Context | None | Overlap only | Full context |
| Speed | Fastest | Medium | Slowest |
| Cost | Lowest | Medium | Highest |
| Quality | Good | Better | Best |
| Text Splitting | By page | Fixed windows | Smart boundaries |
| PDF Support | ✅ Yes | ❌ No | ❌ No |
| Image Support | ✅ Yes | ❌ No | ❌ No |
| Best For | PDFs, speed | Continuous text | Quality, technical docs |
Checkpoint Support
All three algorithms support checkpointing for resuming interrupted translations:Checkpoints save translation state every N pages/chunks. If translation is interrupted, Tinbox automatically resumes from the last checkpoint.
Choosing the Right Algorithm
When to use Page-by-Page
When to use Page-by-Page
- You’re translating PDF documents
- Speed and cost are your primary concerns
- Pages are relatively self-contained
- You don’t need perfect terminology consistency across pages
When to use Sliding Window
When to use Sliding Window
- You’re translating continuous text (TXT, DOCX)
- You need consistent terminology
- The document has a flowing narrative
- You want a balance of quality and cost
When to use Context-Aware
When to use Context-Aware
- Translation quality is critical
- You need consistent terminology and style
- The document has complex structure
- You’re translating technical or literary content
- Cost is less of a concern