Overview
The Convert Service provides document format conversion capabilities using LibreOffice bindings. It processes conversion requests from SQS queues and supports bidirectional conversion between multiple document formats. Key Features:- Multi-format document conversion
- DOCX to PDF conversion
- PDF to DOCX conversion
- Queue-based async processing
- S3 integration for file storage
- Internal-only API (requires internal auth)
- SQS queue consumer for async jobs
- LibreOffice bindings for conversion
- Process isolation via fork for stability
Convert Document
Convert a document from one format to another.This is an internal-only endpoint requiring internal API authentication.
Source S3 bucket containing the input file
S3 key path to the source file. File extension determines source format
Destination S3 bucket for converted file
S3 key path for output file. File extension determines target format
Whether conversion completed successfully
Supported Formats
The Convert Service supports conversion between these formats:Document Formats
| Format | Extension | Input | Output | Notes |
|---|---|---|---|---|
| Microsoft Word | .docx | ✓ | ✓ | Modern Word format |
.pdf | ✓ | ✓ | Portable Document Format | |
| Rich Text | .rtf | ✓ | ✓ | Cross-platform text |
| OpenDocument | .odt | ✓ | ✓ | Open standard format |
| Plain Text | .txt | ✓ | ✓ | Unformatted text |
| HTML | .html | ✓ | ✓ | Web format |
Common Conversions
DOCX to PDF
DOCX to PDF
High-fidelity conversion preserving formatting, fonts, and layout.
PDF to DOCX
PDF to DOCX
Converts PDF to editable Word document. Quality depends on PDF structure.
DOCX to HTML
DOCX to HTML
Exports Word document as web-ready HTML.
Queue-Based Processing
The Convert Service primarily operates as an SQS queue consumer for asynchronous conversions.Queue Message Format
Processing Flow
- Receive: Consumer polls SQS queue for conversion jobs
- Download: Fetch source file from S3 bucket
- Convert: Process file using LibreOffice bindings in isolated fork
- Upload: Write converted file to destination S3 bucket
- Notify: For DOCX conversions, invoke WebSocket lambda with result
- Cleanup: Delete temporary files and acknowledge SQS message
DOCX Upload Notifications
When converting DOCX files, the service sends completion notifications:Conversion Process
Process Isolation
Conversions run in forked child processes for stability:- Fork: Parent spawns isolated child process
- Convert: Child loads LibreOffice and performs conversion
- Exit: Child exits with status code (0 = success)
- Monitor: Parent waits for completion with 30-second timeout
Timeout Handling
Conversions have a 30-second timeout:- If conversion exceeds timeout, child process is killed
- Temporary files are cleaned up
- SQS message is released back to queue for retry
Temporary Files
Each conversion creates a temporary workspace:Error Handling
Common Errors
Error description
| Error | Cause | Resolution |
|---|---|---|
unable to get file type | Invalid file extension | Ensure file has valid extension |
unable to get lok filter | Unsupported conversion | Check supported formats |
unable to get object from S3 | Source file not found | Verify S3 bucket and key |
conversion failed | LibreOffice error | Check file format validity |
fork failed | System resource error | Retry or contact support |
Retry Strategy
Failed conversions follow SQS retry policy:- Message returns to queue after visibility timeout
- Up to 3 retry attempts
- Dead letter queue after max retries
- Manual inspection for persistent failures
Performance Considerations
Conversion Time
Typical conversion times:- Small files (< 1 MB): 1-3 seconds
- Medium files (1-10 MB): 3-10 seconds
- Large files (10-50 MB): 10-30 seconds
- Files > 50 MB: May timeout (30s limit)
Concurrency
- Multiple workers process queue in parallel
- Each worker handles one conversion at a time
- Fork isolation prevents interference between jobs
Resource Usage
- CPU-intensive during conversion
- Temporary disk space: ~3x source file size
- Memory: ~500MB per LibreOffice instance
Backfill Operations
Backfill DOCX
Bulk conversion endpoint for batch processing (internal only).Array of conversion requests to process
Integration Example
Triggering Conversion from Another Service
Monitoring
Metrics
Key metrics to monitor:- Queue depth: Number of pending conversions
- Processing time: Average conversion duration
- Error rate: Failed conversion percentage
- Timeout rate: Conversions exceeding 30s
Logging
Conversion logs include:job_id: Unique conversion identifierfrom_key: Source file pathto_key: Destination file pathstatus_code: Child process exit code- Processing timestamps and durations