Available Algorithms
Page-by-Page
Best for PDFs and documents with clear page boundaries
Context-Aware
Default for text files, maintains context across chunks
Sliding Window
Legacy algorithm, deprecated in favor of context-aware
Algorithm Comparison
| Feature | Page-by-Page | Context-Aware | Sliding Window |
|---|---|---|---|
| Best For | PDFs, image documents | Text files, long documents | Legacy use only |
| Context Preservation | None between pages | Full context between chunks | Limited overlap |
| Cost Efficiency | High | Medium (4x context overhead) | Medium |
| Quality | Good for independent pages | Excellent for continuous text | Good |
| Supports PDF | ✅ Yes | ❌ No | ❌ No |
| Resumable | ✅ Yes | ✅ Yes | ✅ Yes |
| Glossary Support | ✅ Yes | ✅ Yes | ✅ Yes |
Page-by-Page Algorithm
Translates documents one page at a time without context between pages.How It Works
Use Cases
PDF Documents
PDF Documents
Best choice for PDFs - Each PDF page is processed as a separate image by vision-capable models (GPT-4o, Claude Sonnet, Gemini Pro).
Documents with Independent Pages
Documents with Independent Pages
Documents where each page is self-contained (presentations, forms, reports with clear page breaks).
Cost-Sensitive Projects
Cost-Sensitive Projects
When you want to minimize input tokens - no context overhead means lower costs.
This algorithm has no context overhead, making it the most cost-effective option.
Advantages
- Low cost: No context overhead between pages
- Fast processing: Pages can theoretically be processed in parallel
- Simple error handling: Failed pages don’t affect others
- Memory efficient: Only one page in memory at a time
Limitations
- No context preservation: Terms and style may vary between pages
- Not suitable for continuous narratives: Stories or articles may lose coherence
- Text-only for non-PDFs: Treats each page independently
Context-Aware Algorithm
The default algorithm for text files, using smart text splitting at natural boundaries while maintaining context between chunks.How It Works
Smart Text Splitting
The algorithm splits text at natural boundaries in this priority order:- Custom split token (if provided) - ignores target size
- Paragraph breaks (
\n\n) - Sentence endings (
.!?followed by space) - Line breaks (
\n) - Clause boundaries (
;:,followed by space) - Word boundaries (whitespace)
- Character position (fallback)
Smart splitting ensures chunks break at natural points, preventing mid-sentence or mid-word breaks that could harm translation quality.
Context Information
Each chunk receives context in this format:Use Cases
Long Text Documents
Long Text Documents
Default for .txt files - Maintains narrative flow and terminology consistency.
Continuous Narratives
Continuous Narratives
Stories, articles, books where context between sections is crucial.
Structured Documents with Custom Delimiters
Structured Documents with Custom Delimiters
Documents with clear section markers that should be used as split points.
Advantages
- Excellent coherence: Context ensures consistent terminology and style
- Smart splitting: Breaks at natural boundaries, not mid-sentence
- Bidirectional context: Uses both previous and next chunks
- Glossary friendly: Works excellently with glossary feature
Limitations
- Higher cost: ~4x input token overhead due to context (see src/tinbox/core/cost.py:125-142)
- Text only: Not supported for PDF/image content
- Sequential processing: Must process chunks in order
Configuration
Sliding Window Algorithm
Translates text by creating overlapping windows with fixed size and overlap.How It Works
Why It’s Deprecated
- Fixed window size: Doesn’t respect natural boundaries
- No context: Each window translated independently
- Overlap complexity: Merging overlapping translations is unreliable
- Inferior to context-aware: The context-aware algorithm provides better quality with smart splitting
If you have a use case requiring sliding window, consider using context-aware with
--context-size instead.Choosing the Right Algorithm
Check Your Document Type
- PDF? → Use
--algorithm page - Text file? → Use default (context-aware) or specify
--algorithm context-aware
Consider Your Requirements
- Need context preservation? → Context-aware
- Cost sensitive? → Page-by-page
- Continuous narrative? → Context-aware
- Independent pages? → Page-by-page
Algorithm Performance Tips
Related Topics
Cost Optimization
Learn how to minimize translation costs
Troubleshooting
Common issues and solutions