General Questions
What file formats are supported?
What file formats are supported?
SwissKnife supports a wide range of file formats across different categories:
Documents (solution.py:36)
Input/Output formats:- PDF, DOCX, DOC, TXT, MD (Markdown), EPUB
- PPTX (PowerPoint), XLSX (Excel)
- HTML, TEX (LaTeX), XML, BIB
- JSON, RST (reStructuredText), RTF, ODT
- ORG (Org-mode), IPYNB (Jupyter Notebook)
- FB2, ICML, OPML, TEXI, TEXTILE
- TYP, MUSE, HS, ADOC (AsciiDoc)
- And more formats supported by Pandoc
Images (solution.py:38)
Supported formats:- JPG/JPEG, PNG, WEBP, GIF
- BMP, TIFF, PDF
- Convert between any image formats
- Convert images to PDF
- Automatic color mode conversion (RGB, RGBA, L)
- Quality optimization for JPEG (85% quality)
Audio (solution.py:40)
Supported formats:- MP3, WAV, FLAC
- AAC, OGG, M4A
- MP3: 192k bitrate with libmp3lame codec
- AAC/M4A: 192k bitrate with aac codec
- WAV: PCM 16-bit (lossless)
- OGG: 192k bitrate with libvorbis codec
- FLAC: Lossless compression
Video (solution.py:42)
Supported formats:- MP4, AVI, MKV, MOV
- WMV, FLV, WEBM
- GIF (animated)
- H.264/H.265 encoding for high compatibility
- Optimized GIF creation with palette generation
- Audio extraction to separate audio files
- Preset configurations for fast encoding
Archives (solution.py:44)
Supported formats:- ZIP, TAR, GZ (tar.gz)
- BZ2 (tar.bz2), 7Z, RAR
- Extract from any archive format
- Create new archives in any format
- Password-protected archive support
- Format conversion (e.g., ZIP to 7Z)
Important Limitations
- PDF conversion limitation: While you can convert documents TO PDF, converting FROM PDF to other document formats is not supported
- Archive tools: Some formats (.7z, .rar) require external command-line tools
- Document conversions: Require Pandoc and LaTeX for full functionality
How do I install dependencies?
How do I install dependencies?
SwissKnife requires both Python packages and system-level tools.Manual installation:Or add dependencies individually:Linux/Ubuntu:macOS:Archive tools (for .7z and .rar):Linux/Ubuntu:macOS:Windows:Simply follow the instructions provided in the error message.
Python Dependencies
Recommended method using UV:System Dependencies
LaTeX (for PDF conversions):Windows:- Download 7-Zip from 7-zip.org
- Download RAR from rarlab.com
- Add installation directories to system PATH
Verify Installation
Troubleshooting
If you encounter “Module not found” errors, the tool will display helpful messages (solution.py:20-25):Can I convert password-protected files?
Can I convert password-protected files?
Yes, SwissKnife supports password-protected archives.How it works:
Supported Operations (solution.py:138-143)
Extract and convert password-protected archives:- The tool extracts the password-protected archive to a temporary directory (solution.py:137-140)
- Converts the extracted contents to the target archive format
- Can re-apply password protection to the output (solution.py:143)
Password Protection Behavior
Creating password-protected archives (solution.py:143):- If you provide a password, it will be applied to the output archive
- Exception: TAR-based formats (.tar, .tar.gz, .tar.bz2, .tar.xz) do not support passwords
- Supported for output: ZIP, 7Z, RAR
Password Handling Details
The password parameter is passed to patoolib (solution.py:140):Important Notes
- Security: Passwords are passed as command-line arguments and may be visible in process lists
- Format limitations: Not all archive formats support password protection
- Documents: Password-protected PDFs or Office documents are NOT supported (only archives)
- Debugging: The tool prints “Password: <password>” when processing (solution.py:138)
Limitations
- Encrypted document files (PDF, DOCX with encryption) are not supported
- Only archive formats (.zip, .7z, .rar) support password protection
- Password must be provided via command line (no interactive prompt)
What's the file size limit for AI summaries?
What's the file size limit for AI summaries?
The AI summarization feature has a 100MB file size limit (solution.py:165-166).2. Split the document:3. Compress the PDF:This helps you understand if you’re approaching the limit.
Size Validation
When you run the summarize command, the tool:- Checks if the file exists and is readable (solution.py:163)
- Calculates file size in megabytes (solution.py:165)
- Rejects files over 100MB (solution.py:166)
Workarounds for Large Files
1. Convert to plain text first:- Use external tools to compress the PDF
- Remove embedded images or reduce their quality
- Convert high-resolution scans to optimized versions
Why the Limit?
The limit exists because:- Files are uploaded to Google’s Gemini API servers (solution.py:171)
- Large files take longer to upload and process
- Processing timeout is 5 minutes (300 seconds) (solution.py:181)
- API quotas and rate limits may apply
File Size Display
The tool displays the file size during processing (solution.py:167):How do I customize the AI summary prompt?
How do I customize the AI summary prompt?
You can customize the AI behavior by editing the Example 2: Different output formatThese parameters are controlled by the
summarize_prompt.txt file.Prompt Template Location
The prompt is loaded from./summarize_prompt.txt in the current directory (solution.py:175):Template Variables (solution.py:175)
The prompt file supports two placeholders that are replaced at runtime:1.{{SUMMARY_REQUIREMENTS}} - Describes the desired summary length and format- Short: “a brief summary about the essence of the document in 1 paragraph”
- Medium: “a concise summary about the essence of the document in 2-3 paragraphs”
- Long: “a detailed summary about the essence of the document in 3-4 paragraphs”
{{FILE_DETAILS}} - JSON representation of the uploaded file metadata- Contains file information from the Google AI service
- Includes file name, size, mime type, etc.
Summary Length Configuration (solution.py:173-174)
Each length option has specific parameters:Customization Examples
Example 1: Focus on specific aspectsUsing Custom Prompts
Simply edit thesummarize_prompt.txt file before running the summarize command:Model Configuration (solution.py:183)
The tool uses Google’s Gemini 2.5 Flash model with these settings:--length flag and cannot be modified without editing the source code.Can I preserve the original file during conversion?
Can I preserve the original file during conversion?
Yes, use the Working path selection (solution.py:110-111):Cleanup (solution.py:156-157):This ensures the entire input directory remains unchanged during batch operations.
--preserve-original flag to keep the input file unchanged.How It Works (solution.py:98-111)
Without the flag (default behavior):- The tool works directly on the input file
- Original file may be modified during conversion
- Input file is validated but not backed up
--preserve-original flag:- Creates a temporary copy of the input file (solution.py:56-57)
- Works on the temporary copy instead (solution.py:110-111)
- Original file remains untouched
- Temporary files are cleaned up automatically (solution.py:156-157)
Usage Examples
Implementation Details
Temporary file creation (solution.py:56-57):When to Use
Recommended scenarios:- Converting important documents you need to keep
- Testing conversion settings before committing
- Creating multiple output formats from one source
- Working with read-only or archived files
- Converting temporary files
- You have backups elsewhere
- Using batch conversion (automatic preservation)
Batch Conversion Behavior
In batch conversion,preserve_original is always enabled (solution.py:93):Cleanup Guarantee
Temporary files are cleaned up even if conversion fails (solution.py:152-157):What happens if output file exists?
What happens if output file exists?
SwissKnife prompts for confirmation before overwriting existing files.Your options:The default is 2. Remove the file beforehand:3. Use a fresh output directory for batch operations:This can be tedious if many files exist. Consider:If
Overwrite Confirmation (solution.py:63-65)
When the output file already exists, you’ll see:- Type
yand press Enter → File will be overwritten - Type
nor just press Enter → Operation cancelled, no changes made
Implementation
N (no), so pressing Enter without typing anything will cancel the operation.File Preparation (solution.py:66)
If you confirm the overwrite:- Parent directories are created if they don’t exist:
output_abs.parent.mkdir(parents=True, exist_ok=True) - Existing file is removed:
Path(output_abs).unlink(missing_ok=True) - Conversion proceeds to create new file
Avoiding the Prompt
1. Use a different output name:Batch Conversion Behavior
During batch conversion, the prompt appears for each existing file (solution.py:93-94):- Using a fresh output directory
- Clearing the output directory first
- Moving existing files to a backup location
Summary Operations
For summaries (solution.py:190-193), the output file is automatically named:document_summary.txt exists, you’ll be prompted to overwrite it.Safety Features
The confirmation system prevents:- Accidental data loss
- Overwriting important documents
- Losing work from previous conversions
How does batch conversion work?
How does batch conversion work?
Batch conversion processes multiple files from one directory to another in a single command.2. Extension normalization (solution.py:87-88):3. Output directory creation (solution.py:89):4. File discovery (solution.py:89-90):5. Batch processing (solution.py:91-95):Convert all images to JPEG:Convert all audio files to MP3:Extensions with or without dots:2. Test with a small subset first:3. Check the summary:
Basic Usage
How It Works (solution.py:83-95)
1. Directory validation (solution.py:84-86):Examples
Convert all Word documents to PDF:Key Features
Automatic preservation:- Original files are always preserved (solution.py:93)
- The
--preserve-originalflag is not needed
- If one file fails, others continue processing (solution.py:94-95)
- Failed conversions are counted and reported
- Error messages show which file failed
Limitations
1. Non-recursive:- Only processes files in the specified directory
- Does not search subdirectories
- Use separate commands for nested folders
- Must specify exact extension
- Case-sensitive on Linux/macOS
- Only processes files matching the input extension
- Each existing output file triggers a prompt (solution.py:63-65)
- Can be tedious with many files
- Use a fresh output directory to avoid this
Best Practices
1. Use fresh output directories:- Review successful and failed counts
- Investigate failed conversions individually
- Re-run failed files with verbose error messages
What AI models are used for summarization?
What AI models are used for summarization?
SwissKnife uses Google’s Gemini 2.5 Flash model for document summarization.Get your API key from Google AI Studio.Python package (solution.py:168):
Model Configuration (solution.py:183)
Model Parameters by Length (solution.py:173-174)
Short summaries:- Max tokens: 1500
- Temperature: 0.5 (more focused/deterministic)
- Output: 1 paragraph
- Max tokens: 2500
- Temperature: 0.7 (balanced)
- Output: 2-3 paragraphs
- Max tokens: 4000
- Temperature: 0.8 (more creative/varied)
- Output: 3-4 paragraphs
top_p: 0.9for nucleus sampling- Custom prompt from
summarize_prompt.txt
API Requirements
Google API Key (solution.py:164):Why Gemini 2.5 Flash?
Gemini 2.5 Flash is optimized for:- Fast processing speeds
- Large context windows (can handle long documents)
- Cost-effective API usage
- Multimodal capabilities (text, images, documents)
- Good balance between quality and performance
How long do conversions take?
How long do conversions take?
Conversion time varies based on file type, size, and complexity.This is calculated from start to finish:
Timing Information
Every conversion displays its duration (solution.py:153-155):Typical Duration Ranges
Document conversions:- Small documents (< 10 pages): 1-3 seconds
- Medium documents (10-50 pages): 3-10 seconds
- Large documents (> 50 pages): 10-30 seconds
- Depends on: Page count, images, formatting complexity, LaTeX compilation
- Usually very fast: < 1 second
- Large images or PDF output: 1-3 seconds
- Batch processing: Depends on count and size
- Short clips (< 5 min): 2-5 seconds
- Songs (3-5 min): 5-15 seconds
- Long recordings: Proportional to duration
- High-quality formats (FLAC): Slightly longer
- Short clips (< 30 sec): 5-20 seconds
- Standard videos (1-5 min): 20-60 seconds
- Long videos: Can take several minutes
- GIF creation: Additional time for palette generation (solution.py:71-74)
- Small archives: 1-5 seconds
- Large archives: 10-60+ seconds
- Depends on: File count, compression ratio, archive format
- Upload time: Depends on file size and internet speed
- Processing: 10-120 seconds typically
- Maximum timeout: 5 minutes (300 seconds) (solution.py:181)
Factors Affecting Speed
- File size - Larger files take longer
- Complexity - Rich formatting, many images, high resolution
- Output format - Some formats require more processing
- System resources - CPU, RAM, disk speed
- Dependencies - LaTeX compilation can be slow
- Network - AI summarization requires upload/download
Optimization Tips
Use--preserve-original judiciously:- Adds time for creating temporary copy
- Only use when necessary
- Files are processed sequentially, not in parallel
- Total time = sum of individual conversions
- Progress shown for each file (solution.py:93)
Can I use SwissKnife in scripts or automation?
Can I use SwissKnife in scripts or automation?
Yes, SwissKnife is designed to work well in scripts and automated workflows.Errors (exit code 1) (solution.py:266-270):Python script for automated conversion:Workarounds:Or in Python:Check the final summary:
Exit Codes
SwissKnife follows standard Unix exit code conventions:Success (exit code 0):Script Examples
Bash script for batch processing:Handling the Overwrite Prompt
The overwrite prompt (solution.py:63-65) can interrupt automation:- Remove output files beforehand:
- Use unique output names:
- Pipe ‘y’ to the command:
Capturing Output
SwissKnife prints informative messages during operation:Environment Variables
For AI summarization in scripts:Batch Processing Safety
Batch conversion continues even if individual files fail (solution.py:94-95):Cron Jobs
Example cron job for daily conversions:Best Practices
- Always check exit codes in scripts
- Use absolute paths when possible
- Handle the overwrite prompt appropriately
- Set GOOGLE_API_KEY in environment for AI features
- Log output for debugging
- Test scripts with small datasets first