Skip to main content
Tinbox provides three translation algorithms, each optimized for different scenarios. Understanding when to use each algorithm will help you achieve the best translation quality and cost efficiency.

Available Algorithms

Page-by-Page

Best for PDFs and documents with clear page boundaries

Context-Aware

Default for text files, maintains context across chunks

Sliding Window

Legacy algorithm, deprecated in favor of context-aware

Algorithm Comparison

FeaturePage-by-PageContext-AwareSliding Window
Best ForPDFs, image documentsText files, long documentsLegacy use only
Context PreservationNone between pagesFull context between chunksLimited overlap
Cost EfficiencyHighMedium (4x context overhead)Medium
QualityGood for independent pagesExcellent for continuous textGood
Supports PDF✅ Yes❌ No❌ No
Resumable✅ Yes✅ Yes✅ Yes
Glossary Support✅ Yes✅ Yes✅ Yes

Page-by-Page Algorithm

Translates documents one page at a time without context between pages.

How It Works

# From algorithms.py:147-354
async def translate_page_by_page(
    content: DocumentContent,
    config: TranslationConfig,
    translator: ModelInterface,
    # ...
):
    # Translates each page independently
    for page_num, page in enumerate(content.pages, start=1):
        request = TranslationRequest(
            content=page,
            context=None,  # No context between pages
            # ...
        )
        response = await translator.translate(request)

Use Cases

Best choice for PDFs - Each PDF page is processed as a separate image by vision-capable models (GPT-4o, Claude Sonnet, Gemini Pro).
tinbox translate --to es --algorithm page --model openai:gpt-4o document.pdf
Documents where each page is self-contained (presentations, forms, reports with clear page breaks).
tinbox translate --to de --algorithm page --model anthropic:claude-3-sonnet report.pdf
When you want to minimize input tokens - no context overhead means lower costs.
This algorithm has no context overhead, making it the most cost-effective option.

Advantages

  • Low cost: No context overhead between pages
  • Fast processing: Pages can theoretically be processed in parallel
  • Simple error handling: Failed pages don’t affect others
  • Memory efficient: Only one page in memory at a time

Limitations

  • No context preservation: Terms and style may vary between pages
  • Not suitable for continuous narratives: Stories or articles may lose coherence
  • Text-only for non-PDFs: Treats each page independently

Context-Aware Algorithm

The default algorithm for text files, using smart text splitting at natural boundaries while maintaining context between chunks.

How It Works

# From algorithms.py:762-938
async def translate_context_aware(
    content: DocumentContent,
    config: TranslationConfig,
    # ...
):
    # Smart splitting at natural boundaries
    chunks = smart_text_split(text, context_size, config.custom_split_token)
    
    # Translate with context from previous chunk
    for i, current_chunk in enumerate(chunks):
        context_info = build_translation_context_info(
            previous_chunk=previous_chunk,
            previous_translation=previous_translation,
            next_chunk=next_chunk,
        )
        
        request = TranslationRequest(
            content=current_chunk,
            context=context_info,  # Includes previous and next chunks
            # ...
        )

Smart Text Splitting

The algorithm splits text at natural boundaries in this priority order:
  1. Custom split token (if provided) - ignores target size
  2. Paragraph breaks (\n\n)
  3. Sentence endings (.!? followed by space)
  4. Line breaks (\n)
  5. Clause boundaries (;:, followed by space)
  6. Word boundaries (whitespace)
  7. Character position (fallback)
Smart splitting ensures chunks break at natural points, preventing mid-sentence or mid-word breaks that could harm translation quality.

Context Information

Each chunk receives context in this format:
[PREVIOUS_CHUNK]
... previous chunk text ...
[/PREVIOUS_CHUNK]

[PREVIOUS_CHUNK_TRANSLATION]
... previous translation ...
[/PREVIOUS_CHUNK_TRANSLATION]

[NEXT_CHUNK]
... next chunk text ...
[/NEXT_CHUNK]

Use this context to maintain consistency in terminology and style.

Use Cases

Default for .txt files - Maintains narrative flow and terminology consistency.
tinbox translate --to fr --context-size 2000 --model openai:gpt-4o novel.txt
Stories, articles, books where context between sections is crucial.
tinbox translate --to ja --context-size 1500 --model anthropic:claude-3-sonnet story.txt
Documents with clear section markers that should be used as split points.
# Split on "---" markers
tinbox translate --to es --split-token "---" --model openai:gpt-4o chapters.txt

Advantages

  • Excellent coherence: Context ensures consistent terminology and style
  • Smart splitting: Breaks at natural boundaries, not mid-sentence
  • Bidirectional context: Uses both previous and next chunks
  • Glossary friendly: Works excellently with glossary feature

Limitations

  • Higher cost: ~4x input token overhead due to context (see src/tinbox/core/cost.py:125-142)
  • Text only: Not supported for PDF/image content
  • Sequential processing: Must process chunks in order
Context-aware algorithm increases input tokens by ~4x due to context overhead. Use --dry-run to preview costs before translating large documents.

Configuration

# Adjust chunk size (default: 2000 characters)
tinbox translate --to es --context-size 1500 --model openai:gpt-4o document.txt

# Use custom split token
tinbox translate --to fr --split-token "###" --model openai:gpt-4o sections.txt

# Preview costs before translating
tinbox translate --to de --dry-run --model openai:gpt-4o large_doc.txt

Sliding Window Algorithm

This algorithm is deprecated and kept only for backwards compatibility. Use context-aware instead for better results.
Translates text by creating overlapping windows with fixed size and overlap.

How It Works

# From algorithms.py:357-517
async def translate_sliding_window(
    content: DocumentContent,
    config: TranslationConfig,
    # ...
):
    # Create fixed-size overlapping windows
    windows = create_windows(text, window_size, overlap_size)
    
    # Translate each window independently
    for window in windows:
        request = TranslationRequest(
            content=window,
            context=None,  # No context
            # ...
        )
    
    # Merge windows by removing overlap
    final_text = merge_chunks(translated_windows, overlap_size)

Why It’s Deprecated

  • Fixed window size: Doesn’t respect natural boundaries
  • No context: Each window translated independently
  • Overlap complexity: Merging overlapping translations is unreliable
  • Inferior to context-aware: The context-aware algorithm provides better quality with smart splitting
If you have a use case requiring sliding window, consider using context-aware with --context-size instead.

Choosing the Right Algorithm

1

Check Your Document Type

  • PDF? → Use --algorithm page
  • Text file? → Use default (context-aware) or specify --algorithm context-aware
2

Consider Your Requirements

  • Need context preservation? → Context-aware
  • Cost sensitive? → Page-by-page
  • Continuous narrative? → Context-aware
  • Independent pages? → Page-by-page
3

Test with Dry Run

tinbox translate --to es --dry-run --model openai:gpt-4o document.pdf
Preview costs and token estimates before committing.

Algorithm Performance Tips

Page-by-Page
  • Use with --checkpoint-frequency 5 to save progress every 5 pages
  • Enable --glossary to maintain terminology consistency across pages
  • Set --max-cost to prevent runaway costs on large PDFs
Context-Aware
  • Adjust --context-size based on document structure (smaller for more breaks, larger for continuous text)
  • Use --split-token for documents with clear section markers
  • Enable --glossary for automatic term extraction and consistency
  • Monitor costs with --max-cost due to context overhead

Cost Optimization

Learn how to minimize translation costs

Troubleshooting

Common issues and solutions

Build docs developers (and LLMs) love